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AB STB ACT 


The  design  of  a  sixteen-bit  pipelined  adder  CMOS  inte¬ 
grated  circuit  is  presented.  The  adder  is  designed  to 
Maximize  throughput  and  to  provide  for  testability. 
Tutorial  material  on  CHOS  design  is  also  presented. 
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I.  IHTRODDCTION 


For  several  years  the  ability  of  systems  engineers  to 
design  custom  digital  integrated  circuits  has  been  growing. 
The  Head  and  Conway  design  methodology  described  in 
Introduction  to  VLSI  Systems  [ Eef.  1 ],  permits  the  systems 
engineer  to  be  his  own  logic  circuit  designer.  A  prolifera¬ 
tion  of  computer-aided  design  (CAD)  systems  such  as  the 
MacPitts  silicon  compiler  [Ref.  2],  the  chip  layout  language 
(CL1)  [  Eef .  3],  the  graphics  editor  Caesar  [Ref.  4],  and  the 
Burlap  hierarchical  layout  language  [Ref.  5]  make  it 
possible  for  the  engineer  to  rapidly  carry  the  Mead  and 
Conway  design  method clogy  through  to  a  final  design.  This 
includes  iterative  simulation  and  redesign  to  provide  justi¬ 
fiable  confidence  in  the  final  design  submitted  for 
fabrication. 

Many  of  the  technigues  utilized  in  the  Head  and  Conway 
methodology  and  most  of  the  CAD  ’  tools  are  based  on  having 
the  final  design  implemented  in  a  technology  that  uses  only 
one  type  of  doping  for  the  semiconductor  material  in  the 
active  region  of  the  transistors.  Because  of  their  higher 
switching  speed,  negatively  doped  metal  oxide  semiconductor 
(NMOS)  transistor  technologies  are  generally  used. 

Selection  of  an  NMOS  implementation  technology  does 
provide  the  systems  engineer  with  a  complete  and  proven 
methodology  for  the  design  of  a  very  large  scale  integrated 
(VLSI)  circuit  and  allows  the  use  of  many  extensively  tested 
CAD  tools.  Like  any  other  design  decision,  selection  of 
NMOS  inplementation  brings  with  it  some  limitations.  There 
are  two  primary  problems  associated  with  NMOS  digital 
circuits. 
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The  first  is  the  ultimate  switching  speed  limitation. 
Though  many  NMOS  VLSI  circuits  operate  at  clock,  rates  in  the 
8  to  10  KHz  range,  there  are  many  applications  requiring 
higher  clock  rates.  The  second  problem  is  the  dissipation 
of  the  relatively  large  amount  of  power  consumed  by  NMOS 
digital  circuits.  State  of  the  art,  commercially  available 
NMOS  VLSI  circuits  commonly  have  power  consumptions  in  the 
vicinity  of  3  to  5  watts.  Considerable  design  effort  is 
required  to  insure  that  the  dissipation  of  this  much  energy 
by  a  chip  measuring  approximately  5  millimeters  on  a  side 
does  not  alter  the  performance  of  the  micron  sized  features 
on  the  chip. 

One  group  of  technologies  that  offers  both  increased 
switching  speed  and  greatly  reduced  power  consumption  is 
complementary  metal  oxide  semiconductors  (CMOS) .  CMOS 
circuits  also  offer  the  benefits  of  greater  radiation  hard¬ 
ening  and  increased  noise  margin.  In  this  thesis  investiga¬ 
tion,  much  of  the  Mead  and  Conway  methodology  was  utilized 
in  the  design  of  a  CMOS  circuit.  A  general  purpose  color 
graphics  CAD  tool  called  Caesar  that  has  been  frequently 
used  in  the  design  of  NMOS  circuits  was  employed.  In 
carrying  out  the  design  of  the  16  bit  pipelined  high  speed 
adder  in  CMOS  two  separate  goals  were  pursued.  The  first, 
of  course,  is  speed  and  the  second  is  verifiability.  A  high 
speed  adder  implies  not  only  a  high  clock  rate  of  operation 
but  also  a  small  latency  between  input  of  operands  and 
output  of  the  sum. 

A  discussion  of  CMOS  technologies  and  the  implementation 
of  logic  circuits  in  those  technologies  follows  in  Chapter 
2.  Chapter  3  presents  a  description  of  the  CAD  tools  used 
to  construct  and  simulate  the  layout  for  the  adder.  The 
logic  and  layout  design  of  the  adder  is  covered  in  Chapter  4 
and  is  followed  by  a  test  plan  for  the  fabricated  chip  in 
Chapter  5. 
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II.  CMOS  CIRCUITS 


Before  the  design  of  CMOS  digital  circuits  can  be 
attempted,  an  understanding  of  how  to  best  implement  logic 
functions  in  CMOS  is  necessary.  It  is  also  important  to  be 
aware  of  the  advantages  and  disadvantages  of  the  different 
CMOS  implementation  technologies.  In  this  chapter  the  oper¬ 
ation  of  CMOS  digital  circuits  is  explained  using  similar 
NMOS  circuits  as  a  benchmark  for  comparison.  The  different 
methodologies  for  assembling  the  CMOS  pieces  to  produce  the 
desired  logical  results  are  reviewed  and  the  selection  of 
the  CMOS-Bulk  p-well  implementation  technology  is  explained. 

1.  COMPARISON  WITH  NMOS 

In  NMOS  digital  circuits  there  is  only  one  type  of 
switching  device,  namely  the  n-channel  enhancement  mode 
metal  oxide  semiconductor  (MOS)  transistor.  The  other  prin¬ 
cipal  device  utilized  in  NMOS  circuits  is  the  depletion  mode 
n-channel  MOS  device  which  acts  as  a  load  resistor.  In  CMOS 
there  are  both  n-channel  and  p-channel  enhancement  mode 
transistors  available.  As  in  NMOS,  the  n-channel  device  can 
be  considered  on  when  Vdd  (typically  +5  Volts  DC) ,  a  logical 
1,  is  present  on  its  gate.  The  p-channel  device  can  be 
considered  on  when  ground  (GND)  ,  a  logical  0,  is  present  on 
its  gate.  In  Figure  2.1  are  the  symbols  that  will  be  used 
for  the  n-channel  and  p-channel  transistors  in  this  thesis. 

The  basic  differences  between  NMOS  and  CMOS  technologies 
can  be  demonstrated  by  comparing  their  application  to  some 
basic  digital  circuits. 
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Figure  2.  1  C80S  Transistor  Symbols. 

1 .  The  Inverter 

Figure  2.2  (a)  shows  an  NMOS  inverter.  Whenever 

there  is  a  logical  1  on  the  input,  the  voltage  drop  across 
the  lead  resistor  is  approximately  Vdd  and  the  output  is  a 
logical  0.  This  results  in  steady  state  power  consumption. 
When  tie  input  switches  to  a  logical  0,  before  the  output 
can  assume  a  logical  1,  the  lead  capacitance  (Cl)  on  the 
output  must  be  charged  to  Vdd  through  the  load  resistor  with 
a  resistance  of  several  kilohms.  This  results  in  a  much 
longer  transition  from  0  to  1,  where  the  load  capacitance  is 
charged  through  the  load  resistor,  than  from  1  to  0  where 
the  load  capacitance  is  discharged  through  the  switched  on 
NMOS  enhancement  transistor.  The  reason  for  this  asymmetry 
is  that  the  pull-down  transistor’s  on  resistance  is  typi¬ 
cally  only  one  fourth  or  less  that  of  the  on  resistance  of 
the  pull-up  load  depletion  mode  transistor.  The  technique 
Cj.  charging  circuits,  where  all  outputs  are  set  to 

logic*.  1  during  one  clock  cycle  and  then  selectively  forced 
to  0  c  the  opposite  (evaluation)  clock  cycle  has  proven 
helpf  l  in  gaining  control  over  the  unsymmetric  switching 
times.  This  longer  switching  time  from  0  to  1  must  still  be 
accounted  for,  however,  and  represents  the  primary  limita¬ 
tion  to  the  speed  of  NMOS  circuits. 
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The  only  way  to  stop  this  destructive  process  once  it  has 
started  is  to  disconnect  Vdd  or  GND.  Prevention  of  latchup 
must  he  designed  in. 


Figure  2.12  Bipolar  Transistors  in  CMOS-Bulk  [Bef.  6]. 


Figure  2.13  The  Latchup  Circuit  [ Eef . 


features 


The  MOSIS  CMOS-Bulk  p-vell  design  rules  include 
for  the  specific  purpose  of  reducing  the 


25 


Figure  2.11  P-lell  Process,  Side  View  [Bef.  9]. 


24 


REPRODUCED  AT  GOVERNMENT  EXPENSE 


2.10  P-Iell  Process,  Top  View  [Ref.  6], 
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negative  of  the  P+  mask)  ,  (5)  contact  cuts  are  made,  and 
(6)  the  metal  is  placed. 

a.  Latchup  in  CMOS-pw 

One  of  the  main  problems  associated  with 
CMOS-Eulk,  both  p-well  and  n-vell  is  latchup.  Basically 
latchup  involves  generation  of  a  short  circuit  between  Vdd 
and  GND,  and  can  result  in  the  complete  destruction  of  a 
chip.  Many  researchers  have  tried  to  formally  define  the 
conditions  [Bef.  8]  that  cause  latchup  to  occur.  This  task 
is  extremely  complex  because  the  phenomenon  is  so  dependent 
on  layout,  which  is  unigue  to  each  chip  design.  Though  a 
fully  quantitative  analysis  of  latchup  is  still  not  avail¬ 
able,  a  qualitative  analysis  will  show  what  happens  on  the 
chip  when  latchup  occurs. 

Looking  at  the  side  view  of  an  inverter  in 
Figure  2.12,  parasitic  bipolar  transistors  can  be  seen.  The 
base  of  the  npn  transistor  is  the  p-well  and  the  base  of  the 
pnp  transistor  is  the  n-doped  substrate.  These  parasitic 
transistors  are  connected  as  shewn  in  Figure  2.13  .  if  the 
output  of  the  gates  goes  below  GND  by  a  value  equal  to  the 
threshold  of  the  npn  transistor,  its  emitter  starts  to 
inject  current  (electrons)  intc  the  base  (p-well)  and  the 
resultant  collector  current  flows  to  the  Vdd  node.  If  the 
resistance  between  the  Vdd  node  and  the  source  of  the 
pull-up  p-channel  MOS  transistor,  Rl,  is  large  enough,  the 
voltage  drop  across  B1  will  exceed  the  threshold  of  the  pnp 
transistor.  The  collector  current  (holes)  of  the  pnp  device 
flows  to  the  GND  node.  If  the  resistance  between  the  GND 
node  and  the  source  of  the  pull-down  n-channel  MOS  tran¬ 
sistor,  H2,  is  great  enough,  the  resultant  voltage  drop 
across  E2  will  increase  the  base  current  in  the  npn  tran¬ 
sistor.  is  is  evident,  there  is  positive  feedback. 


1 .  CMOS-SOS 

The  only  process  currently  offered  by  Metal-Oxide 
Semiconductor  Implementation  Service  (MOSIS)  which  uses  an 
electrically  insulating  substrate  is  Silicon  on  Sapphire 
(SOS) .  In  this  technology  the  n-channel  and  p-channel  tran¬ 
sistors  are  formed  on  silicon  islands  left  after  etching  an 
epitaxial  layer  of  silicon  on  a  sapphire  (Al^Os)  substrate. 

2 .  CMOS-Bulk 

The  other  CMCS  processes  offered  by  MOSIS  all  use 
CMOS-Bulk  p-well  technology.  The  p-well  processes  differ  in 
the  number  of  layers  of  metal  interconnections  (1  or  2)  and 
the  presence  or  absence  of  capacitors.  In  CMOS-Bulk  p-well 
(n-well)  the  substrate  is  n-doped  (p-doped)  and  the 
p-channel  (n-channel)  devices  are  in  this  substrate.  To 
isolate  the  n-channel  (p-channel)  devices  from  the  substrate 
a  heavily  doped  p-well  Cn-well)  is  first  placed  to  act  as 
the  back  gate.  The  heavy  doping  of  the  p-well  (n-well) 
degrades  the  performance  of  the  n-channel  (p-channel)  device 
while  the  p-channel  (n-channel)  device  is  optimized.  In 
p-well  CMOS,  though  the  mobility  of  electrons  in  the 
n-channel  device  still  exceeds  that  of  the  holes  in  the 
p-channel  device,  the  performance  difference  of  the  transis¬ 
tors  is  xinimized.  The  more  uniform  performance  of  the  two 
transistor  types  makes  the  p-well  process  appropriate  for 
CMOS  random  logic. 

Figures  2.10  and  2.11  represent  the  top  and  side 
views  of  the  steps  of  the  CMOS-pw  process  for  the  production 
of  an  inverter.  These  steps  are:  (1)  starting  with  an 
n-type  substrate  the  p-well  is  patterned,  (2)  The  active 
areas  in  the  p-well  and  on  the  substrate  are  established, 
(3)  the  polysilicon  is  patterned,  (4)  the  two  ion  implant 
masks  are  placed  (the  N*  mask  is  simply  the  photographic 


21 


REPRODUCED  AT  GOVERNMENT  EXPENSE 

area  consuming  in  this  case  because  these  are  simple  gates 
with  only  a  few  inputs.  Each  NCR  gate  if  implemented  stati¬ 
cally  would  need  two  n-channel  devices  and  two  p-channel 
devices.  If  implemented  dynamically,  each  NOR  gate  requires 
three  transistors  of  one  type  (one  for  each  input  and  one 
for  the  control  signal)  and  one  transistor  of  the  other  type 
(for  the  control  signal  again)  .  The  number  of  transistors 
needed  remains  the  same  but  the  dynamic  logic  requires  the 
designer  to  keep  three  inputs  electrically  isolated  instead 
of  just  two.  And  if  the  dynamic  design  technique  is  domino, 
six  additional  inverters  will  be  needed.  As  can  be  seen  in 
Figure  2.4,  in  CMOS  a  NOR  gate  can  be  constructed  from  just 
one  stage.  Adding  the  follow-on  inverter  of  the  domino 
design  results  in  an  OR  gate.'  Thus  a  second  inverter  is 
required  to  return  the  logic  to  that  of  a  NOR  gate. 


C.  CMOS  IMPLEMENTATION  TECHNOLOGIES 

One  of  the  principal  issues  in  the  design  of  a  process 
to  implement  CMOS  digital  circuits  in  silicon  is  how  to 
isolate  the  two  types  of  devices.  This  can  be  accomplished 
by  using  a  completely  insulating  substrate  or  through  a  more 
complex  fabrication  process. 
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logical  AND  of  the  boolean  function  (ini  in2  ♦  in3)  to  be 
implemented  and  a  control  (clock)  signal.  When  the  clock  is 
low,  the  circuit  is  precharged,  and  when  the  clock  is  high 


Figure  2.8  Domino  CMOS  Structure  [Bef.  6]. 


evaluation  occurs.  With  a  common  clock  shared  by  all  the 
domino  gates  on  a  chip,  during  the  evaluation  cycle  the 
signals  ripple  through  the  chip  as  though  the  logic  were 
purely  static.  The  follow  on  inverter  insures  that  the 
output  of  each  gate  is  low  when  evaluation  begins.  This 
prevents  the  outputs  of  all  gates  from  changing  unless 
driven  low  by  the  inputs.  Domino  CMOS  is  not  always  the 
answer  though.  If  the  logic  of  Figure  2.9  were  implemented 
in  domino  CMOS  it  would  be  more  area  consuming  than  the  same 
circuit  implemented  in  static  CMOS.  Dynamic  CMOS  is  more 
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lost  and  there  is  steady  state  power  consumption  during  the 
evaluation  cycle.  The  circuit  in  Figure  2.7  (c)  is  prec¬ 
harged  when  elk  is  low  and  evaluation  of  the  inputs  takes 
place  when  elk  is  high.  This  configuration  allows  only  one 
change  of  the  output  from  1  to  0,  so  the  inputs  must  be 
stable  at  the  time  elk  goes  high.  A  change  of  one  of  the 
inputs  from  1  to  0  after  elk  has  gone  high  cannot  cause  the 
output  to  return  to  1. 

In  general  dynamic  CMOS  eliminates  the  redundancy  of 
static  CMOS  by  applying  all  inputs  to  one  type  of  device  and 


Figure  2.7  Dynamiq  HAHD  Gates  [Bef.  6]. 


a  control  signal  to  the  other  type  of  device.  The  most 
popular  dynamic  CMOS  logic  design  technique  is  domino  CMOS 
[Bef.  7],  illustrated  in  Figure  2.8  Here  the  output  is  the 


Figure  2.6  HHOS-Iike  CHOS  Static  Gate  [Ref.  6] 


approach  is  to  make  extensive  use  of  transmission  gates  to 
build  up  logic  functions.  Using  transmission  gates  means 
both  polarities  of  all  control  signals  are  required.  The 
resulting  large  number  of  wires  required  to  route  these 
control  signals  can  become  very  area  consuming,  especially 
if  only  one  metal  layer  is  available. 

A  third  and  more  effective  solution  is  to  use  dynamic 
logic.  Figure  2.7  contains  three  different  implementations 
of  a  dynamic  three-  input  NAND  gate.  In  each,  the  output  is 
meaningful  (i.e.  represents  the  value  of  the  boolean  expres¬ 
sion  ini  in2  in3)  only  when  elk  is  high  and  elk  is  low.  The 
circuits  of  Figure  2.7  (a)  and  (b)  depend  on  the  pull-up  to 
pull-down  ratio  to  produce  the  proper  output.  As  with  the 
HMOS-like  style  of  design,  full  excursion  on  the  output  is 
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Figure  2.5  CMOS  Transmission  Gate. 

In  general  CMOS  technologies  are  ratioless.  The  use 
of  "improper”  ratios  will  not  affect  the  logical  operation 
of  most  CMOS  gates,  it  will  only  affect  the  speed  of  opera¬ 
tion  of  the  gates. 

B.  CMOS  DESIGH  METHODOLOGIES 

Static  gate  CMOS  circuits  have  three  serious  deficien¬ 
cies  when  compared  to  static  HMOS  gates.  First,  they  are 
more  area  consuming.  Second,  they  can  be  slower.  Though 
the  individual  gates  can  be  faster  in  CMOS,  the  p-channel 
and  n-channel  gates  are  in  parallel,  thus,  the  fanout3  and 
the  output  load  capacitance  of  each  circuit  are  doubled 
Third,  a  CMOS  static  gate  is  redundant,  duplicating  its 
functionality  in  both  the  pull-up  and  pull-down  section. 

One  approach  to  remedy  these  deficiencies  is  to  use  a 
static  NMOS-like  style  of  design  as  in  Figure  2.6  Here  the 
p-channel  device  is  always  on  and  the  pull-up  to  pull-down 
dimension  ratio  is  relied  upon  to  produce  the  proper  output 
voltage.  This  introduces  power  consumption  problems  and 
takes  away  the  full  excursion  on  the  output.  Another 


3Fanout  represents  the  numre 
output  of  a  logic  gate  must  drive 


esents  the  number  of  transistors  that  the 
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Figure  2.4  2-input  Nor  Gate. 


high  voltages  well.  The  resulting  unpredictable  voltage 
drops  make  it  necessary  to  utilize  both  types  of  transis¬ 
tors.  This  increase  in  complexity  over  its  NMOS  counterpart 
is  partially  offset  by  the  absence  of  the  level  restoring 
circuitry  NMOS  requires  following  a  pass  transistor.2 


2In  NMOS  digital  circuits  the  length  to  width  ratio  of 
the  pull  down  transistor  is  usually  four  times  that  of  the 
Repletion  mode  transistor  load.  This  ratio  is  required  to 
insure  sufficient  excursion  of  the  output  voltage.  However, 
after  a  pass  transistor  is  used,  a  ratio  of  8:1  rather  than 
4: 1  must  be  used  to  restore  the  VGS  threshold  voltage  drop 
across  the  pass  transistor. 


Figure  2.3  Exnxmaa  Dxmensxon  Inverters. 


disconnection  between  the  output  and  Vdd.  Logically  these 
two  actions  are  equivalent,  therefore  only  one  action  should 
be  necessary  to  implement  the  logic.  Design  methodologies 
to  accomplish  this  are  described  in  section  B  of  this 
chapter.  The  parallelism  of  the  CMOS  transmission  gate  of 
Figure  2.5  and  the  HMOS  pass  transistor  is  evident.  The 
major  difference  lies  in  the  bilateral  nature  of  the  CMOS 
transmission  gate.  It  is  made  up  of  both  n-channel  and 
p-channel  devices  and  requires  both  polarities  of  the 
control  signal  for  operation.  The  reason  for  this  bilateral 
requirement  is  that  the  p-channel  device  does  not  transmit 
low  voltages  well  and  the  n-channel  device  does  not  transmit 


the  n-channel  is  greater  than  the  mobility  of  the  noles  m 
the  p-channel.  Also,  the  capacitive  load  seen  by  the 
p-channel  device  in  CMOS  p-vell  (CMOS-pv)  is  greater  than 
the  load  seen  by  the  n-channel  device  because  of  the  highly 
doped  p-well.  Typically,  the  result  in  CMOS-pv  is  a 
slightly  longer  transition  time  of  the  0  to  1  output  tran¬ 
sition.  Some  designers  attempt  to  compensate  for  this  by 
consistently  making  the  p-channel  transistors  wider  than  the 
n-channel  transistors. 

Unlike  NMOS,  the  output  of  a  CMOS  digital  circuit 
makes  a  full  excursion  between  Tdd  and  GND.  This  makes  CMOS 


circuits  less  sensitive  to  noise  than  NMOS  circuits. 


CMOS 


should  also  benefit  more  from  future  reductions  in  feature 


size. 


NMOS  is  more  restricted  in  ultimate  feature  size 


because  the  power  dissipation  requirements  of  the  depletion 
mode  devices  will  create  more  problems  as  feature  sizes 
shrink.  In  Figure  2.3  the  relative  sizes  of  minimum  dimen¬ 
sion  inverters  implemented  in  currently  available  3  micron 
feature  size  CMOS-PR  and  NMOS  technologies  are  shown. 

2.  The  NOB  Gate  and  Transmission  Gate 

Figure  2.4  shows  the  circuit  diagrams  and  layouts  of 
a  two-input  NOH  gate  implemented  in  both  CMOS-PW  and  NMOS. 
From  Figures  2.3  and  2.4  it  is  evident  that  static1  CMOS 
gates  are  more  complex  and  area  consuming  than  their  NMOS 
counterparts.  In  these  fully  complementary  circuits  a 
redundancy  in  the  structures  is  evident.  The  pull-up  only 
or  pull-dcvn  only  would  be  sufficient  to  implement  the 
logic.  In  the  CMOS  circuits  of  Figures  2.3  and  2.4  the 
inputs  must  perform  two  tasks.  A  logical  1  on  an  input 
causes  both  a  connection  between  the  output  and  ground  and  a 


4Static  logic  circuits  continuously  evaluate  their 
inputs  and  produce  their  specified  logic  output.  Dynamic 
circuits  perform  logical  evaluation  of  the  inputs  only  when 
directed  to  do  so  by  control  signals  and/or  clock  signals. 


Figure  2.2  (a)  HMOS  Inverter 


(b)  CMOS  Inverter 


In  the  CMOS  inverter  of  Figure  2.2  (b)  the  input  is 
applied  to  the  gates  of  both  devices.  An  input  of  logical  1 
causes  the  n-channel  device  to  switch  on  and  the  p-channel 
device  to  switch  off,  resulting  in  an  output  of  logical  0. 
Similarly,  an  input  of  0  results  in  an  output  of  1.  In  both 
cases,  one  device  is  fully  off,  representing  a  resistance  on 
the  order  of  gigaohms.  Thus,  the  steady  state  power 
consumption  is  essentially  zero.  In  operation  the  only 
power  consumption  of  consequence  occurs  during  the  tran¬ 
sition  when  neither  transistor  is  fully  on  or  off. 
Additionally,  since  the  output  load  capacitance  is  both 
charged  and  discharged  through  a  turned  on  transistor,  the  1 
to  0  and  0  to  1  switching  delays  are  theoretically  the  same. 

Actually  the  switching  delays  depend  on  many  parame¬ 
ters.  The  n-channel  and  p-channel  device  dimensions  are 
frequently  not  the  same,  the  nobility  of  the  electrons  in 
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probability  of  latchup.  The  minimum  separation  rules  for 
p-wells  and  P+  doped  active  areas  exist  for  this  purpose. 
Their  aim  is  to  reduce  the  gain  of  the  parasitic  bipolar 
transistors,  thus  reguiring  a  larger  noise  spike  of  longer 
duration  to  start  the  latchup  sequence.  A  frequently  used 
technique  is  the  grounding  of  the  p-well  as  illustrated  in 
Figure  2.14  .  Here  the  effect  cf  the  P*  doped  area  covering 
half  of  the  contact  cut  for  the  ground  bus  is  to  reduce  the 
resistance  S2  in  Figure  2.13  .  Another  practice  is  to  place 
a  small  capacitor  across  the  Vdd  and  GND  pins  of  CMOS-Bulk 
chips.  To  provide  capacitive  filtering  of  noise  spikes  on 
the  chip,  Vdd  and  GND  busses  are  frequently  run  close 
together.  Also,  Vdd  input  pads  are  designed  to  provide 
capacitance  betveen  Vdd  and  GND. 


Figure  2. 14  Grounding  of  the  P-Well. 


3.  Twin-tub  CMOS 


This  process,  also 
n-wells  and  p-wells  on  a 


called  twin-well, 
high  resistivity 


uses  both 
N-  or  P- 
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substrate,  or  m  an  epitaxial  layer  of  silicon  on  a  P+  or  N* 
wafer.  Since  the  well  doping  does  not  have  to  overcome  the 
substrate  doping,  both  the  n-channel  transistors  in  the 
p-well  and  the  p-channel  transistors  in  the  n-well  can  be 
optimized.  Domino  CMOS  is  enhanced  by  the  use  of  this 
process  since  the  optimized  n-channel  devices  can  speed  up 
the  complex  boolean  expression  evaluation  and  the  optimized 
p-channel  devices  can  speed  up  the  signal  drive  between 
stages  (thereby  reducing  the  effect  of  a  given  fanout) . 


D.  CMOS  TECHNOLOGY  SELECTION 

The  CMOS  implementation  technologies  available  from 
MOSIS  are  CMOS-Bulk  p-well  with  one  metal  layer,  CMOS-Eulk 
p-well  with  two  metal  layers,  CMOS-Bulk  p-well  with  two 
metal  layers  and  capacitors  (for  analog  circuits)  and 
CMOS-SOS. 

The  advantages  of  CMOS-Bulk  are:  (1)  very  good  noise 
margin,  (2)  faster  than  NMOS,  and  (3)  a  proven  reliable 
fabrication  process.  Its  disadvantages  are:  (1)  latchup 
susceptibility,  (2)  use  of  p-well  guard  rings  is  needed  if 
radiation  hardening  is  desired,  (3)  lower  circuit  density 
than  NMOS  or  CMOS-SOS,  and  (4)  more  complex  design  rules 
than  either  NMOS  or  CMOS-SOS. 

The  advantages  of  CMOS-SOS  are:  (1)  faster  than  NMOS  or 

CMOS-Bulk,  (2)  very  good  noise  margin,  (3)  intrinsically 
radiation  hardened,  and  (4)  no  latchup.  Its  disadvantages 
are:  (1)  expensive  fabrication  process  due  to  the  sapphire, 

(2)  sapphire  variability  reduces  the  reliability  of  the 
fabrication  process,  (3)  thermal  mismatch  between  the 
sapphire  and  silicon  limits  the  carrier  mobility,  and  (4)  it 
is  not  a  viable  technology  for  dynamic  memory  due  to  back 
channel  leakage. 


CMOS-Bulk  p-well  was  selected  as  the  implementation 
process  for  the  adder  for  the  following  reasons.  First, 
technology  files  for  this  process  were  available  at  the 
Naval  Postgraduate  School  (NPS)  enabling  the  use  of  extant 
computer  aided  design  (CAD)  tools.  Second,  since  this  would 
be  the  first  CMOS  VLSI  design  at  NPS,  utilizing  the  most 
reliable  process  is  prudent  to  prevent  design  problems  from 
being  clouded  by  implementation  process  problems. 


III.  DESIGN  TOOLS 


To  employ  the  Mead-Conway  design  methodology  on  a  large 
scale  design,  three  computer  aided  design  (CAD)  tools  are 
needed.  A  layout  design  editor  for  viewing  the  circuits  as 
they  are  created  is  the  first  tool  required.  Next,  a  design 
rule  checker  is  necessary  to  confirm  that  all  the  design 
rules  for  the  specified  technology  have  been  adhered  to. 
Though  not  a  complex  task,  the  large  number  of  checks  that 
must  be  made  for  even  a  modest  design  makes  manual  design 
rule  checking  highly  error  prone.  Finally,  a  circuit  simu¬ 
lator  is  needed  to  verify  that  the  circuit  as  designed 
provides  the  proper  logical  output.  In  the  design  of  the 
sixteen-tit  pipelined  adder,  the  Caesar  layout  editor 
[Ref.  4],  the  Lyra  design  rule  checker  [Ref.  10],  and  C. 
Terman's  RNL  circuit  simulator  [Ref.  11]  were  employed. 

A.  CAESAR 

Caesar  is  a  generic  layout  editor.  It  is  not  designed 
for  any  particular  VLSI  implementation  technology.  It  is 
not  even  limited  to  designing  integrated  circuits.  Caesar 
is  a  graphics  layout  editor  for  the  creation  and  manipula¬ 
tion  of  rectangles  where  the  user  specifies  the  color,  size, 
and  placement.  It  is  through  the  user  specified  technology 
file  that  the  rectangles  of  color  take  on  meaning.  At  the 
Naval  Postgraduate  School  (NPS)  there  are  two  technology 
files  available  for  use  with  Caesar.  One  is  for  N-doped 
metal  oxide  semiconductors  (NBOS)  and  the  other  is  for 
complementary  metal  oxide  semiconductors  utilizing  a  P-doped 
well  (CMCS-pw)  . 


Caesar  works  with  files  cf  its  own  special  format. 
These  file  are  indicated  by  an  appended  file  type  of  ca(i.e. 
mx.ca) .  On  command  Caesar  will  generate  a  Caltech 
Intermediate  Format  (CIF)  file  cf  the  same  layout.  Again  it 
is  the  technology  file  which  tells  Caesar  which  CIF  layer 
labels  to  attach  to  the  colored  rectangles. 

At  NFS,  Caesar  is  set  up  to  take  commands  from  any 
terminal  where  the  execution  of  the  Caesar  program  is  initi¬ 
ated  (usually  the  ADM-3a  console  adjacent  to  the  color 
graphics  display  unit)  and  from  a  four-button  puck  on  a 
graphics  tablet  attached  to  the  color  display  device. 
Caesar  displays  its  graphics  results  on  an  AED  767  color 
monitor  and  displays  its  menus,  messages,  and  prompts  on  the 
command  console.  Detailed  information  on  the  installation 
and  operation  of  Caesar  at  NPS  can  be  found  in  Beference  4 
and  Beference  2. 

Caesar  is  an  interactive  CAE  tool.  The  results  of  any 
command  are  rapidly  displayed  on  the  AED  767.  The  results 
of  a  command  may  be  undone  (u)  cr  repeated  (.)  with  a  single 
stroke  of  the  specified  key  on  the  command  console.  While 
running  Cae$ar,  a  user  may  also  call  upon  the  design  rule 
checker,  Lyra,  to  check  the  area  inside  and  within  three 
Caesar  units4  of  the  current  box  for  design  rule  violations. 
This  interactive  use  of  the  layout  graphics  display  and  the 
design  rule  checker  helps  to  insure  that  there  will  not  be 
any  design  rule  forced  changes  late  in  the  design  cycle  when 
changes  are  much  more  time  consuming.  With  Caesar's  level 
of  interaction  with  the  designer,  the  design  loop  consisting 
of  (1)  issue  commands  to  perturb  existing  circuit,  (2) 
visual  inspection  to  verify  command's  generation  of  desired 


,  4 A  Caesar  design  is  layed  out  on  a  gpid  of  Caesar  units. 
These  units  do  not  represent  any  specific  length.  When 
creating  a  CIF  file  from  a  Caesar  file  the  desired  length  of 
a  Caesar  unit  is  specified. 


results,  and  (3)  design  rule  checking  of  new  circuit,  can  be 
rapidly  completed. 

Caesar  is  a  hierarchical  design  tool.  Kith  Caesar, 
circuits  can  be  created  by  piecing  together  cells  (other 
files  of  type  .ca)  which  in  turn  may  be  made  up  of  other 
sub-cells.  Theoretically,  there  is  no  limit  to  the  number 
of  levels  in  the  hierarchy.  Net  only  can  cells  (sub-cells, 
etc.)  be  called  upon  to  fill  locations  in  a  circuit,  if  they 
need  to  be  modified  to  function  properly,  Caesar  provides  a 
subedit  mode  to  facilitate  editing  of  layouts  one  level 
below  the  current  editing  level.  Care  must  be  taken  when 
this  subedit  feature  is  used  since  the  changes  made  to  the 
cell  are  global.  Everywhere  the  given  cell  is  used  on  the 
chip,  the  newly  edited  version  will  appear. 

B.  LIRA 


like  Caesar,  Lyra  is  a  generic  design  rule  checker. 
When  Lyra  is  invoked  from  within  Caesar,  the  actual  program 
executed  to  check  for  design  rule  errors  depends  on  the 
technology  file  indicated  in  the  header  of  the  Caesar  file 
being  edited.  After  running,  Lyra  sends  a  message  to  the 
command  console  indicating  the  number  of  errors  found.  On 
the  graphics  display  Lyra  paints  the  exact  location  of  each 
error  and  labels  each  error  with  the  design  rule  violated. 
The  error  label  consists  of  abbreviations  for  the  layers 
involved,  followed  by  an  underscore,  followed  by  an  abbrevi¬ 
ation  for  the  type  of  violation  detected.  Table  1  lists  the 
abbreviations  used  by  Lyra  for  CMOS-pw. 

The  winter  198  3  distribution  of  the  Oniversity  of 
California  at  Berkeley  (OCB)  CAE  tools  included  two  versions 
of  Lyra.  One  for  the  Mead-Convay  NMOS  design  rules  and  the 
other  for  the  Jet  Propulsion  Laboratory's  (JPL)  five-micron 
feature  size  CSOS-pw  design  rules.  Since  MOSIS  no  longer 


TABLE  1 

Lyra  Error  Abbreviations 


Layer 

po-iysilicon 

metal 

p-vell 

n+  diffusion 


cut 

p+  diffusion 


Abbreviation 
“  P  * 


m 

v 

d 


s 

X 


c 

p 


Error 

minimum  width 
minimum  separation 
malformed  transistor 


supports  fabrication  of  the  JEL  CMOS-pw  process,  design 
rules  for  the  MOSIS  supported  three-micron  CHOS-pw  process 
were  obtained.  Professor  Marco  Annatarone  at 
Carnegie-Mellon  University  (CHO) generated  the  listing  of  the 
three-micron  CMOS-pw  design  rules  compatible  with  Lyra  and 
has  provided  NPS  with  a  copy.  To  generate  executable  code 
from  the  prototype  Lyra  program  and  imbed  the  specific 
process  design  rules,  the  program  rulec  (see  Appendix  B)  is 
run  with  the  design  rule  list  file  as  its  argument. 

Now,  when  Lyra  is  invoked  from  Caesar  while  editing  a 
CHOS-pw  technology  circuit,  the  three-micron  minimum  feature 
size  CMOS-pw  design  rules  are  applied.  This  version  of  Lyra 
does  not  check  for  exceeding  any  maximum  dimensions.  The 
only  maximum  size  design  rule  in  this  technology  is  for 
contact  cuts,  which  may  not  exceed  3  microns  by  8  microns. 
Avoidance  of  improper  contact  cuts  can  be  accomplished  by 
utilizing  Caesar's  hierarchical  nature.  Contact  cuts  of  all 
needed  sizes  and  types  are  generated  once  and  saved  to  be 
inserted  as  cells  wherever  needed. 


C.  S ISOLATION 

Once  a  circuit  layout  has  completed  this  initial  design 
loop,  it  matches  the  designer's  conception  of  how  it  should 
appear  and  is  free  of  design  rule  violations.  The  perform¬ 
ance  of  the  given  circuit,  though,  remains  uncertain.  To 
simulate  the  performance  of  the  design,  programs  such  as 
SPICE  [Ref.  11]  and  RNL  [Ref.  11]  are  used. 
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SPICE 


SPICE  is  an  important  simulation  tool  in  the  design 
of  high  speed  CMOS  digital  and  analog  circuits.  With  its 
detailed  device  modeling,  SEICE  can  provide  accurate 
predictions  of  performance  once  the  device  parameters  of  the 
implementation  technology  are  known.  SPICE  provides  the 
logical  output  of  a  circuit  based  upon  the  inputs  and 
describes  the  transient  behavior  of  the  circuit  as  it 
changes  to  the  new  logical  output.  Thus  SPICE  enables  a 
designer  to  optimize  transistor  dimensions  for  speed. 

Unfortunately,  the  version  of  SPICE  currently  avail¬ 
able  on  both  the  Vax  11-780  and  the  IBM  3033  at  NPS  (version 
2G6)  fails  when  the  parameters  of  the  devices  fabricated  by 
the  MCSIS  three-micron  CMOS-pw  process  are  used.  With  these 
parameters  the  transient  behavicr  solutions  do  not  converge. 

Engineers  at  CMU,  UCB,  and  the  University  of 
Washington  (UW)  are  currently  employing  an  experimental 
version  of  SPICE  (version  2X.x  developed  at  UCB)  which  is 
successful  simulating  with  the  three-micron  CMOS-pw  device 
parameters.  This  version,  however,  has  other  bugs  and  is 
therefore  not  available  for  general  distribution.  The 
changes  to  SPICE  2G6  that  enable  SPICE  2X.x  to  simulate  the 
three-micron  CMOS-pw  devices  will  be  incorporated  into  the 
next  distribution  of  SPICE  (version  2G7) .  The  Maval 
Postgraduate  School  is  in  the  gueue  of  institutions  to 
receive  SPICE  2G7  once  it  is  ready. 

In  order  to  run  a  SPICE  simulation  of  a  CMOS  circuit 
designed  using  Caesar,  the  following  steps  should  be 
executed.  First,  the  labeling  feature  of  Caesar  is  used  to 
place  labels  on  the  electrical  nodes  of  interest  in  the 
circuit  (Vdd,  GND,  input,  output,  etc.).  Second,  the  Caesar 
command 

:  cif  100  -p 
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is  issued  to  generate  the  basename. cif  file.  The  parameter 
100  indicates  a  scale  of  100  centimicrons  per  Caesar  unit5 
and  must  be  specified  unless  the  default  value  of  200 
centimicrons  per  Caesar  unit  is  desired.  The  -p  option 
causes  entries  to  be  made  in  the  basename. cif  file  for  the 
labels  assigned.  Third,  after  exiting  Caesar  and  returning 
to  Unix,  the  circuit  extractor  Mextra  [Bef.  10]  is  invoked 
using  the  command 

%  mextra  basename 

to  create  the  file  basename. sim .  To  modify  the  basename. sim 
file  to  a  SPICE  file  (basena me . spice) ,  the  program  sim2spice 
[Ref.  11]  is  used.  The  basena me . spice  file  contains  a  list 
of  transistors  aid  capacitors  in  the  circuit  in  a  SPICE 
compatible  format. 

The  basena  me. spice  file  must  be  edited  to  add  the 
model  parameters  for  the  transistors,  to  specify  the  wave¬ 
forms  of  the  input  (s) ,  to  specify  the  type  of  analysis  to  be 
performed  (usually  transient  analysis)  and  to  specify  the 
output  to  be  produced  (tables,  graphs,  etc.).  The  Spice 
User’s  Manual  [Ref.  11]  contains  the  formats  of  these  addi¬ 
tions  to  basename. spice.  Best  case  and  worst  case  device 
model  parameters  for  the  MOSIS  three-micron  CMOS-pw  process 
as  compiled  by  Dr.  M  Annaratone  of  CMU  and  Dr.  L.  Glasser 
of  MIT  are  found  in  Appendix  A. 

2.  BNL 

RNL  is  a  timing  and  logic  simulator  for  digital  MOS 
circuits.  It  is  an  event  driven  simulator  which  uses  a 
resistance-capacitance  model  of  a  circuit  to  estimate  node 
transition  times  and  to  estimate  the  effects  of  charge 


5Smce  the  minimum  dimensions  for  the  3-micron  CMOS-pw 
process  are  specified  in  microns  instead  of  lambda,  CMOS-pw 
circuits  are  usually  designed  on  Caesar  using  one  micron  per 
Caesar  unit. 


sharing.6  After  input  values  have  been  assigned  by  the  user, 
RNL  calculates  the  effects  of  those  inputs  by  repeating  the 
following  operations  until  there  are  no  further  node  value 
changes:  (1)  when  a  node  is  added  to  the  network  due  to  a 
transistor  being  turned  on,  the  charge  sharing  implications 
of  the  new  node’s  capacitance  and  logic  state  on  each  of  its 
electrical  neighbors  is  computed,  (2)  for  each  node  that 
might  be  affected,  Vthev  and  Ethev  (the  parameters  of  the 
Thevenin  eguivalent  circuit)  are  calculated  and  the  new 
logic  state  is  determined  from  Vthev  (O.OVdd  to  0.3Vdd  = 
logic  0,  0.8Vdd  to  I.OVdd  =  logic  1,  logic  I  otherwise),  (3) 
if  the  node  has  changed  state,  the  transition  time  is  calcu¬ 
lated  using  the  node’s  capacitance,  and  (4)  any  changes  are 
propagated  to  other  nodes.  Details  of  the  computation 
methods  used  by  RNL  can  be  found  in  the  RNL  Version  4. 2  (OH) 
User's  Guide  [Ref.  11].  More  important  to  the  user  is  an 
understanding  of  what  information  RNL  keeps,  what  it 
discards,  and  how  it  decides  what  to  do  next. 

Basic  to  the  operation  of  RNL  is  the  idea  of  an 
event.  The  three  elements  of  an  RNL  event  are:  (1)  a  node 
in  the  network,  (2)  a  new  logic  state  for  the  node,  and  (3) 
the  time  when  the  node  value  changes  to  the  new  logic  state. 
RNL  maintains  a  list  of  events,  sorted  by  time,  that  tells 
what  processing  remains  to  be  done.  When  the  user  changes 
an  input,  an  event  is  added  to  the  list.  RNL  sequentially 
processes  the  next  event  on  the  list,  stopping  when  (1)  the 
list  is  empty,  (2)  a  node  the  user  is  tracing  changes  value, 
or  (3)  when  the  specified  simulation  time  interval  has 
elapsed.  To  process  an  event,  RNL  removes  it  from  the  list, 
changes  the  node's  state  to  reflect  its  new  value,  and  then 


•Charge  sharing  refers  to  the 
happen  when  two  or  more  previously 
having  seme  charge  and  capacitance, 
resistor  (transistor  turning  on). 


capacitive 
unconnected 
become 


effects  that 
nodes,  each 
connected  by  a 
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calculates  any  new  events  resulting  from  the  node’s  new 
value. 

In  calculating  new  events,  first  all  nodes  that 
might  be  affected  by  the  change  are  found  and  marked.  This 
includes  the  source  and  drain  cf  all  transistors  for  which 
the  current  node  is  the  gate  and  all  nodes  connected  to 
these  nodes  through  turned  on  transistors.  The  search 
through  the  network  stops  when  a  non-conducting  transistor 
or  an  input  is  reached.  For  each  marked  node,  two  calcula¬ 
tions  are  made.  First,  a  charge  sharing  calculation  is 
performed  to  model  changes  of  state  due  to  the  charging  and 
discharging  of  node  capacitances.  Second,  a  final  value 
calculation  is  done  to  determine  the  node’s  ultimate  logical 
state. 

A  given  node  can  have  only  two  events  pending:  (1)  a 
charge  sharing  event  describing  an  immediate  change  in  the 
node’s  state  due  to  charge  redistribution  among  the  nodes  on 
the  connection  list,  and  (2)  a  final  value  event  describing 
the  final,  driven  state  of  the  node.  RNL  observes  the 
following  rules  for  processing  events:  (1)  when  a  new  charge 
sharing  event  is  scheduled,  throw  away  all  previously 
pending  events  for  the  node,  and  (2)  when  a  new  final  value 
event  is  calculated,  it  will  be  ignored  if  (a)  there  is  a 
pending  final  event  for  the  same  value  which  is  scheduled  to 
occur  sooner,  (b)  there  is  a  pending  charge  sharing  event 
for  the  same  value  as  the  new  final  event,  or  (c)  there  is 
no  charge  sharing  event  and  the  new  final  value  event  is  the 
same  as  the  node’s  current  value.  These  rules  are  based  on 
the  assumption  that  the  event  that  was  last  calculated 
reflects  the  latest  conf iguraticn  of  the  network  and  there¬ 
fore  should  override  events  calculated  earlier.  Charge 
sharing  events  discard  any  pending  final  value  events 
because  any  charge  sharing  calculation  is  immediately 
followed  by  a  new  final  value  calculation. 
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These  event  rules,  however. 
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sometimes  lead  ENL  to 
generate  incorrect  results.  This  is  especially  true  of 
signal  driven  circuits  (circuits  where  inputs  are  applied  to 
the  source  and  drain  of  a  transistor  as  well  as  its  gate) 
and  circuits  that  depend  on  the  analog  properties  of  the 
devices  to  predict  the  behavior  of  the  circuit.  For 
example,  consider  the  first  exclusive  OE  gate  design  for  the 


Figure  3.1  CMOS  Exclusive  OB  [  Ref .  6]. 

pipelined  adder  in  Figure  3. 1  This  design  has  proven  to 
function  correctly  at  CMU,  however,  the  ENL  simulation  shows 
this  circuit  failing. 

Starting  in  a  state  where  A=0,  B=1,  and  out=1, 
assume  that  the  input  A  then  transitions  to  1.  Initially 
Ql,  Q3,  Q4,  and  Q6  are  on.  When  input  A  goes  high,  Q3  is 
turned  off  (no  events  generated)  and  Q2  is  turned  on,  gener¬ 
ating  a  charge  sharing  event  and  a  final  value  event  for 
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Abar  resulting  in  Abar  going  low.  When  Abar  goes  low,  the 
still  turned  on  Q6  is  now  trying  to  drive  the  output  node 
low  and  the  still  turned  on  Q4  (RNL  recognizes  that  it  takes 
a  finite  amount  of  time  for  Q4  to  turn  off  tut  does  not 
recognize  that  n-channel  transistors  do  not  conduct  high 
voltages  well)  is  still  trying  to  drive  the  output  node 
high.  The  result  is  an  output  of  X,  the  undefined  state. 
Next,  Q4  is  turned  off.  Since  turning  off  Q4  adds  no  new 
nodes  to  the  network,  the  event  list  is  empty  and  the  output 
remains  at  X.  The  primary  difficulty  RNL  has  with  this 
circuit  centers  around  the  fact  that  the  output  node  is 
controlled  by  two  nodes  that  can  change  at  different  times. 
As  a  result,  a  charge  sharing  event  due  to  one  input  can 
eliminate  a  final  value  event  of  the  other,  with  that  final 
value  event  being  the  force  which  determines  the  circuit's 
actual  behavior. 

The  circuit  cf  Figure  3.2  is  a  proven  latch  design 
which  also  fails  in  BNL  simulation.  In  Figure  3.2  the  frac¬ 
tions  next  to  the  transistors  represent  the  length  to  width 
ratios  of  the  devices.  This  circuit  is  dependent  on  these 
ratios  fcr  proper  operation.  These  ratios  insure  that  the 
gain  of  the  input  signal  on  the  gates  of  Q5  and  Q6  is 
greater  than  the  gain  of  the  feedback  signal  to  the  same 
gates.  RNL  does  not  recognize  the  difference  in  these  gains 
to  be  sufficient  to  cause  the  gates  of  Q 5  and  Q6  to  be  at 
either  logical  1  or  0  when  the  input  signal  is  the  opposite 
of  the  feedback  signal.  As  a  result,  the  circuit  becomes 
locked  up  at  X.  Because  of  RNl's  difficulty  with  these  two 
circuits,  other  designs  were  employed  in  the  final  adder 
(see  chapter  5)  to  facilitate  testing  of  the  overall  design. 

To  use  RNL  as  installed  at  N PS,  the  following  steps 
should  be  followed.  First  label  the  circuit  and  generate 
basename.cif  as  before.  Again  the  program  Mextra  is  used  to 
extract  the  circuit,  this  time  with  the  -o  option  (Kextra 
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Figure  3.2  CMOS  Latch  Design  [Ref.  6]. 

basename  -o) .  The  -o  option  causes  Mextra  not  to  compute 
capacitances.  A  follow  on  program  in  this  sequence,  Presim, 
performs  this  computation  with  greater  accuracy.  It  should 
be  noted  that  there  are  three  different  circuit  extraction 
programs,  each  named  Mextra.  There  is  the  MIT  version,  the 
DCB  version  and  the  US  modified  UC3  version.  The  next  tool 
to  be  used  in  the  sequence,  Presim,  can  accept  the  output 
format  of  the  MIT  version  and  the  US  modified  UCB  version. 
At  N PS,  the  UCB  version  is  installed  and  was  used.  The  MIT 
and  US  modified  UCB  versions  differ  in  the  order  of  the 
parameters  in  a  transistor  specification.  Professor 
Annaratone  at  CMU  developed  a  program,  cfcrmat,  to  change  a 
•sim  file  generated  by  the  UCB  version  to  the  MIT  format. 
However,  cformat  does  not  work  if  the  -o  option  is  used  with 
Mextra.  To  avoid  a  loss  of  accuracy,  the  .sim  file  can 
manually  be  changed  to  the  US  modified  UCB  format.  The 
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B.  DESIGN  FOR  TESTABILITY 

Another  primary  objective  cf  the  adder  design  was  to 
provide  for  testability,  that  is,  the  ability  to  '  "gically 
detect  fabrication  errors  or  circuit  malfunctions  rather 
than  visually  searching  for  faults  with  a  microscope. 

As  the  complexity  of  integrated  circuits  has  grown,  the 
ability  to  logically  detect  faults  using  only  the  normally 
available  inputs  and  outputs  has  decreased  markedly.  As 
complexity  increases,  the  number  of  likely  faults  to  be 
tested  for  and  the  number  of  input  vectors  required  to 
isolate  a  specific  fault  grow  rapidly.  Unless  a  design 
technique  is  used  which  allows  the  tester  to  examine  the 
interior  logic  of  a  chip  ,  the  order  of  magnitude  of  the 
number  of  input  vectors  required  to  perform  useful  logical 
testing  is  prohibitive.  Thus,  if  logical  testability  is 
desired,  a  design  technique  that  provides  for  it  must  be 
used . 

One  such  design  technique  is  level  sensitive  scan  design 
(LSSD)  £Ref.  13].  Level  sensitive  implies  that  the  output 
of  any  logic  element  is  dependent  only  on  the  levels  of  its 
inputs.  No  logic  elements  are  allowed  to  depend  on  a  tran¬ 
sition  such  as  in  an  edge  triggered  flip  flop.  Scan  design 
implies  that  all  memory  elements  in  the  design  are  to  have 
an  auxiliary  function  where  their  contents  are  serially  fed 
to  an  output  pad  for  examination.  This  gives  a  tester  the 
ability  to  examine  intermediate  results.  In  applying  the 
LSSD  technique  to  the  adder  design,  the  following  steps  were 
taken . 

First,  all  circuits  were  designed  to  respond  to  the 
level  of  their  inputs  and  not  to  require  a  transition  to 
trigger  their  operation.  Second,  to  insure  that  each  logic 
event  worked  only  with  stable,  non- f luct ua ting  input  levels, 
the  inputs  to  each  event  were  gated.  The  input  gates  were 
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Now,  after  three  events,  estimated  sums  for  each 
4-bit  block  and  the  actual  carry  into  each  block  (Cinb)  are 
available.  From  these  the  sum  can  be  computed  using  equa¬ 
tions  4.17  through  4.20  . 


•S(I)  =  ©£5(1) 

(egn 

4.  17) 

5(J)=  ( 1 )  j  ©  ^  (3) 

(eqn 

4.  18) 

=  ^(1)^(3)]  ©£^(») 

(egn 

4.  19) 

S[i)  =  |cini  £S(i)£S,2)£5(5jj©£5(4) 

(eqn 

4. 20) 

Using  second  level  CIA  logic,  the 

16-bit 

sum 

generated  in  only  four  events.  Additionally,  this  design 
can  easily  be  extended  to  the  generation  of  64-bit  sums. 
The  logic  of  equations  4.5  and  4.6  which  produced  the  second 
level  primitives  BP  and  BG  can  be  used  again  to  generate 
third  level  primitives,  B3P  and  33G.  These  third  level 
primitives  represent  the  carry  propagate  and  carry  generate 
properties  of  16-bit  slices.  The  carry  into  each  16-bit 
block  is  provided  by  implementing  equation  4.7  .  Thus, 
adding  one  event  will  provide  the  carry  into  each  of  four 
16-bit  blocks  of  a  64-bit  sum.  The  logic  of  equation  4.3  is 
then  used  to  generate  the  carry  into  each  4-bit  block  of  the 
sum  and  the  final  sum  is  computed  as  before.  The  final 
result  is  that  by  adding  two  events,  for  a  total  of  six,  and 
using  the  same  logic  as  before  (i.e.  no  new  circuits  need  to 
be  designed),  the  16-bit  adder  can  be  extended  to  a  64-bit 
adder. 
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because  of  the  large  amount  of  area  consumed  by  the  regis¬ 
ters  needed  to  hold  two  possible  answers.  The  second  method 
is  to  compute  the  estimated  sum  of  the  block  assuming  a 
carry-in  of  0  and  then  correcting  the  estimated  sum  once  the 
actual  carry-in  to  each  block  is  known. 

Since  the  estimated  sum,  ES  (i)  ,  is  not  needed  until 
after  the  third  event  and  computing  it  as  one  event  again 
leads  to  fanout  problems,  the  computation  of  ES{4),  the  most 
significant  bit,  through  ES  ( 1)  is  computed  in  two  events  as 
follows.  First,  an  intermediate  estimated  sum,  IES  (i)  ,  is 
computed  using  two-bit  slices,  each  assuming  a  0  carry  in 
(see  eguations  4.8  through  4.11).  At  the  same  time,  a  carry 
from  tit  (2)  into  bit  (3)  (IC23)  is  computed  using  equation 
4.12  On  the  next  event,  ES  (i)  is  computed  from  the  IES(i)'s 
and  IC23  using  equations  4.13  through  4.16  . 


IES[i )  -  P{ i) 

(egn  4.8) 

IES  ,:>=  P{,  j©C<.) 

(eqn  4.9) 

IES[  j)  =  P(s) 

(eqn  4.  10) 

IES[t)  =  P(«)©C(j) 

(eqn  4.  11) 

1C  2S  =  G  (2)+  G  (i )P (2) 

(eqn  4.  12) 

ES  (,)  =  IES(  j) 

(eqn  4.  13) 

ES  (jj  =  lES(i) 

(eqn  4.  14) 

£S(1)=  IC2Z@1ESw 

(egn  4.  15) 

ES  (4)  =  [/£S(1)/C23]©/£:S(4) 

(eqn  4.  16) 
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are  the  block  propagate,  BP(i),  and  block  generate,  3G(i), 
functions.  3P(i)  =  1  implies  that  a  carry  into  block  (i)  will 
be  propagated  through  to  block  (i  +  1).  3G(i)=1  implies  that 

block  (i)  will  generate  a  carry  into  block  (i+1).  For  a  4-bit 
block  where  bit{1)  is  the  least  significant  bit.  The  BP  and 
BG  primitives  are  generated  by  equations  4.5  and  4.6  respec¬ 
tively,  with  the  P(i)*s  and  G(i)*s  computed  as  before. 

BP(i)  =  P{i)P{i)P {3)P{i)  (eqn  4.5) 


BG(>)  =  C(4)+C(j)P(4)-C(2)P(4j/,(s)^£|i)/’(4)P(j)P|j)  (eqn  4.6) 


Next,  the  block  carry,  3C  (i)  ,  which  represents  the  carry 
from  block  (i)  into  block  (i+1),  is  computed  using  equation 
4.7  which  represents  the  same  logic  as  equation  4.3 


t-o 


BG 


(*) 


n  bp 
>-*+! 


o; 


(eqn  4.7) 


So  far,  after  three  events,  the  ?(i)'s,  G(i)’s, 
BP(i)’s,  BG(i)’s,  and  BC(i)'s  have  been  generated.  If  the 
same  method  of  generating  the  final  sum  as  used  in  zero 
level  Clh  were  to  be  used,  two  additional  events  would  be 
required.  The  first  again  applies  the  logic  of  equation  4.3 
to  each  4-bit  block  to  generate  the  carry  into  each  bit. 
Here  the  Cin  for  block  (i)  is  given  by  BC(i-l).  The  second 
cycle  is  used  to  generate  the  sum  from  the  C(i)'s  and 
E  (i) *s.  One  of  these  events  can  be  eliminated  if,  while  the 
BC(i)*s  and  their  predecessors  are  being  computed,  an  esti¬ 
mated  sum  of  the  4-bit  block  is  also  computed.  One  method 
is  to  compute  two  estimated  sums  for  each  block,  one 
assuming  an  carry  into  the  block  of  0  and  the  other  assuming 
a  carry  in  of  1.  When  the  correct  carry  in  for  block  (i)  is 
generated,  it  is  used  to  multiplex  the  correct  sum  for  the 
block  to  the  output.  This  assumed  carry  method  was  rejected 
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2.  First  Level  CIA  Logic 

Noting  that  a  four-bit  sum  generated  using  zero 
level  CLA  logic  is  within  the  design  guidelines  suggests 
cascading  4-bit  slices  of  the  same  logic  as  indicated  in 
Table  2  Here  the  sum  is  available  after  six  events  and  the 

TABLE  2 

First  Level  CLA  Logic  for  a  16-bit  Sum 


Event 

Bits 

Bits 

Bits 

Bits 

No. 

1-4 

5-8 

9-12 

13-16 

1 

Compute 
P(i)  ,G(i) 

Compute 
P{i)  ,G  (i) 

Compute 
P(i)  ,  G  (i) 

Compu  te 

p  (i)  ;g  (i) 

2 

Compu  te 

C  (i) 

Delay 

P  (i)  ,G\i) 

Delay 

P  (i)  ,G  (i) 

Delay 

P  (i)  ,  G  { i) 

3 

Compute 

S(i) 

Compute 

C(i) 

Delay 

P(i)  ,  G  [i) 

Delay 

P  (i)  ,  G  ii) 

4 

Delay 

S(i) 

Compute 

S  (i) 

Compute 

C  (i) 

Delay 
?  (i)  ,G*(i) 

5 

Delay 

S(i) 

Delay 

S  (i) 

Compute 

S(l) 

Compute 

C(i) 

6 

Delay 

S(i) 

Delay 

S  U) 

Delay 

S(i) 

Compu  te 
S(i) 

fanout  is 

reduced  by  a 

factor  of 

four.  The  event  cycle  t. 

reduction 

would  more 

than  make 

up  for  the 

event  co 

increase  since  cycle  time  grows  faster  than  linearly  with 
fanout.  The  only  drawback  with  this  design  lies  in  the  cost 
of  extending  it  to  generate  32-bit  or  64-bit  sums.  For 
every  4-bit  slice  added,  another  event  is  required.  Thus,  a 
64-bit  add  would  require  12  events. 

3.  Second  Level  CLA  Logic 

Again  the  data  is  divided  into  4-bit  slices  called 
blocks.  But  rather  than  let  the  carries  ripple  through  the 
blocks,  two  new  primitive  functions  are  introduced.  They 


49 


most  basic  definition  is  a  combinational  logic  circuit 
accepting  a  set  of  inputs,  performing  its  specified  opera¬ 
tions  on  those  inputs  and  generating  a  set  of  outputs. 
Therefore,  the  input  of  the  addends,  followed  by  the  compu¬ 
tation  and  output  of  the  sum  can  be  considered  as  a  logical 
event.  However,  a  primary  design  consideration  for  the 
adder  is  to  provide  for  testability  and  a  key  element  of 
this  provision  is  the  availability  of  intermediate  results 
(see  section  3  of  this  chapter).  This  implies  breaking  up 
the  sum  generation  into  several  separate  events.  The  first 
event  takes  the  addends  as  inputs,  performs  some  logic  oper¬ 
ation  (s)  on  them  and  stores  the  results  in  a  register.  The 
next  event  takes  its  inputs  from  that  register  and  stores 
its  results  in  another  register.  This  chain  continues  until 
the  last  event  deposits  the  sum  on  the  output  pads  of  the 
chip.  To  provide  the  tester  with  easily  interpreted  inter¬ 
mediate  results,  the  equations  presented  in  this  chapter 
were  taken  as  boundaries  for  each  logical  event.  The  terms 
on  the  right  side  of  the  equation  determine  the  inputs  and 
the  left  side  terms  determine  the  output  of  a  logical  event. 
Once  all  the  inputs  for  an  equation  are  generated  by 
previous  events,  the  logic  of  the  equation  becomes  part  of 
the  current  event. 

1  •  Zero  Level  CLA  Logic 

This  logic  requires  three  events  to  generate  the 
sum.  First,  equations  4.1  and  4.2  are  used  to  generate  the 
P(i)  ’s  and  G  (i)  's.  Second,  from  equation  4.3  the  C  (i)  's  are 
generated.  Finally,  the  sum  is  derived  from  equation  4.4 
The  principal  problem  with  this  approach  for  a  sixteen-bit 
adder  lies  in  the  application  of  equation  4.3  Here,  the 
input  P  (1)  has  a  fanout  of  15,  which  makes  this  approach 
unsatisfactory. 


reversed.  This  led  to  the  following  guidelines  in  the 
design  of  the  adder: 

1)  The  internal  logic  of  each  stage  should  he  accom¬ 
plished  with  minimum  dimension  transistors  ,  3  microns 
x  4  microns  (length  x  width) .  This  leads  to  more 
compact  circuits  with  shorter  interconnections  and 
reduces  the  capacitive  load  on  the  preceding  stage. 

2)  Significantly  wider  transistors  (3-micron  x  9-micron) 
should  be  used  at  the  output  of  each  stage  where  the 
fanout  and  interconnect  leading  is  greater. 

3)  The  fanout  of  any  transistor  should  be  kept  to  less 
than  five. 

This  requires  a  more  complete  definition  of  fanout 
because  the  capacitive  loading  of  a  gate  depends  on  its 
area.  A  3-micron  x  4-micron  transistor  driving  six  other 
3-micron  x  4-micron  transistors  has  a  fanout  of  six.  A 
3-micron  x  8-micron  transistor  driving  the  same  load  is 
considered  to  have  a  fanout  of  three.  Though  this  implies 
that  a  high  fanout  problem  can  be  solved  by  merely 
increasing  the  width  of  the  driving  transistor,  it  neglects 
the  effects  of  the  interconnect  wiring.  As  gates  are  added 
to  the  load  of  a  transistor#  each  subsequent  addition  must 
be  more  remote  from  the  driving  transistor.  Since  the 
resistance  of  the  wiring  is  proportional  to  its  length  and 
inversely  proportional  to  its  width,  the  resistance  of  the 
wiring  will  increase  unless  the  width  is  also  increased. 
However,  since  the  capacitance  of  the  wiring  is  proportional 
to  its  area,  most  of  the  gain  achieved  by  widening  the  wire 
to  reduce  resistance  is  offset  by  the  increase  in  capaci¬ 
tance.  As  a  result,  in  the  design  of  the  adder,  increasing 
the  width  of  the  driving  transistor  was  not  viewed  as  a 
complete  fix  for  a  fanout  problem. 

For  the  comparison  of  the  different  approaches  to  CLA 
addition,  the  term  logical  eveDt  needs  to  be  defined.  The 
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As  pointed  out  by  Flores  £Bef.  12]  and  by  Conradi  and 
Hauenstein  [Bef.  3],  there  are  several  logical  implementa¬ 
tions  of  carry  look  ahead  addition.  A  principal  task  of 
this  thesis  investigation  was  to  select  a  fast  logical 
design.  Without  the  circuit  simulator  Spice,  the  analysis 
of  each  design  considered  was  more  gualitative  than  quanti¬ 
tative.  In  this  gualitative  analysis,  a  turned  on  tran¬ 
sistor  is  considered  as  a  resistor  with  its  resistance 
proportional  to  its  length  and  inversely  proportional  to  its 
width.  All  gates  driven  by  such  a  turned  on  transistor  are 
considered  to  be  capacitive  loads  with  capacitance  propor¬ 
tional  to  the  area  of  the  gate.  The  interconnect  wiring  is 
considered  to  add  both  parallel  capacitive  loading  and 
series  resistance  as  shown  in  Figure  4.1 
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Rwire 


Rwire 


Rwire 


Cwire 


Rtrans 


S2 


l  l 
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Figure  4.1  CMOS  Output  Loading  Model. 

From  this  model  it  is  obvious  that  the  amount  of  inter¬ 
connect  wiring  and  the  number  of  gates  driven  (fanout) 
should  be  minimized  to  minimize  the  output  transition  time 
when  the  positions  of  switches  SI  and  S2  of  Figure  4.  1  are 
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B1,  and  Cin  to  produce  SI  and  Clout  (carry  out  of  tit  one 
into  tit  2).  On  clock  cycle  2  it  uses  A2,  E2,  and  Clout  to 
generate  S2  and  C2out.  Here  16  clock  cycles  elapse  before 
the  sum  is  available.  An  adder  can  also  be  implemented  as  a 
ripple  carry  adder  where  the  duration  of  each  clock  pulse  is 
sufficient  to  allow  a  carry  into  the  sum  to  propagate  all 
the  way  through  to  a  carry  out.  In  the  case  of  the  16-bit 
adder,  this  would  require  a  clock  duration  at  least  sixteen 
times  the  length  of  the  gate  delay  of  the  one  bit  adder. 
The  middle  ground  belongs  to  the  carry  look-ahead  adder 
£Ref-  3].  In  carry  look-ahead  (CIA)  addition  the  carry  into 
each  bit  position,  C  (i) ,  is  generated  from  the  propagate, 

/*(.-)»  *<o ©*<•)  (egn  h.l) 

C,0  =  A (,•)*„)  (ecn  4.2) 

P(i),  and  generate,  G  (i)  ,  primitives.  P  (i)  =1  implies  that 
a  carry  into  bit(i)  will  be  propagated  through  to  bit  (i+1). 
G(i)  =1  implies  that  A  (i)  and  B  (i)  will  provide  a  carry 
into  bit  (i+1)  of  the  sum,  regardless  of  the  contents  of  the 

-  G(.-i)+C(,_,)P(i_1)+  •••  +Cm  P(, -,)••■  P(i)P(\)  (eSn  4*3) 

5(.)  -  c(-)©^(.)  (egn  4.4) 

less  significant  bits  of  A  and  E.  The  algorithm  for  the  CIA 
sum  generation  is  as  follows.  The  first  event  is  the  evalu¬ 
ation  of  equations  4.1  and  4.2  to  generate  the  P  (i)  and  G  (i) 
primitives.  The  second  event  uses  the  P  (i)  and  G(i)  primi¬ 
tives  as  inputs  to  eguation  4.3  to  generate  the  C(i)  's.  The 
final  event  is  the  computation  of  the  S(i)'s  from  ecuation 


IV.  DESIGN  OF  THE  ADDER 


As  stated  in  the  introduction/  the  primary  goals  of  the 
adder  design  are  to  maximize  throughput  and  to  provide  for 
testability.  The  adder  is  to  he  a  pipelined  adder.  Every 
clock  cycle  it  should  accept  as  inputs  two  16-bit  addends 
(A  1 /  the  least  significant  bit/  through  A16  and  31/  the 
least  significant  bit,  through  B16)  and  one  carry-in  (Cin) 
bit.  It  is  desired  to  produce  the  16-bit  sum  (SI  ,the  least 
significant  bit,  through  S16)  and  the  carry-out  (Gout)  bit 
as  quickly  as  possible.  Both  the  number  of  clock  cycles 
from  input  of  the  addends  to  the  output  of  the  sum  and  the 
duration  of  each  clock  cycle  are  to  be  minimized.  A  secon¬ 
dary  consideration  in  the  design  is  expandability.  An 
expandable  design  is  one  that  can  easily  be  extended  to 
produce  a  32-bit  or  64-bit  sum  utilizing  the  same  circuit 
structures.  In  this  chapter  the  logical  design  and  layout 
design  of  the  16-bit  adder  will  be  presented.  The  equations 
presented  in  this  chapter  are  taken  or  derived  from  equa¬ 
tions  found  in  chapters  three  through  six  of  The  Logic  of 
Computer  Arithmetic  by  Flores  [ Eef .  12].  In  these  equations 
concatenation  implies  the  logical  AND,  the  symbol  +  implies 
the  logical  OR,  and  the  symbol  +  implies  the  logical  XOK. 

A.  LOGICAL  DESIGN 

In  considering  the  speed  spectrum  of  adders  from  a 
logical  standpoint,  at  the  fast  end  there  is  the  table 
look-up.  With  33  binary  inputs  and  17  outputs,  this  would 
require  an  address  space  of  233  17-bit  words.  With  current 
technology  this  is  not  feasible.  At  the  other  end  of  the 
spectrum  is  the  serial  adder.  On  clock  cycle  1  it  uses  A1, 


protection  circuit-  In  the  extraction  and  simulation 
process  this  resistor  is  viewed  as  an  open  circuit. 
Therefore,  on  input  pads,  the  input  label  must  be  placed 
after  the  resistor  in  the  signal  path. 

With  Caesar,  Lyra,  and  ENL,  a  designer  at  NPS  has 
the  requisite  CAD  tools  for  the  complete  logical  circuit 
design  loop.  With  these  tools  circuits  that  are  free  of 
design  rule  errors  and  produce  the  desired  logical  results 
can  be  designed.  The  lack  of  SPICE  somewhat  restricts  the 
designer’s  ability  to  optimize  speed,  but  there  are  several 
design  techniques  that  can  be  employed  to  design  chips  that 
run  fast.  These  will  be  covered  in  the  next  chapter. 
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it  may  be  written  in 


the  parentheses  of  another  command), 
the  more  natural  form: 

function  argument  argument  ....  <newline> 

Tutorials  on  RNL  are  contained  in  the  University  of 
Washington/Nor thwest  VLSI  Consortium’s  VLSI  Design  Tools 
Reference  Manual  [Ref.  11].  There  are  two  points  concerning 
the  Mextra,  Presim,  RNL  simulation  cycle  a  user  should  be 
aware  of  that  are  not  brought  out  in  the  documentation.  The 
first  concerns  the  use  of  vectors  in  RNL  commands.  As 
evidenced  in  the  tutorials  of  Reference  11  and  the  adder 
Simula  lion  results  in  Appendix  D,  vectors  can  be  used  to 
make  the  input  and  output  of  RNL  less  cumbersome  and 
verbose.  After  the  vector  has  been  defined,  a  user  will 
then  want  to  assign  values  to  it.  The  documentation  shows 
the  format  of  the  vector  value  assignment  command  to  be: 
(invec  • (vecname  values)) 

However,  the  "values"  field  has  its  own  specific  format. 
The  first  character  should  be  a  0  or  a  1  indicating  positive 
and  negative  numbers,  respectively.  The  LISP  interpreter 
will  work  with  negative  numbers  but  RNL  will  not  accept 
negative  numbers  as  logical  inputs.  The  second  character  is 
a  letter  specifying  the  number  base  of  the  input  vector  (b 
for  binary,  h  for  hexadecimal) .  For  example,  to  assign  the 
binary  value  +101010  to  the  vector  vectone,  the  RNL  command 
would  be: 

(invec  *  (vectone  Obi  0  1010)) 

The  other  point  concerns  the  location  of  input 
labels  on  the  input  pads.  When  the  entire  chip  is  being 
simulated,  the  input  labels  are  normally  placed  on  the  metal 
pads  where  the  off  chip  leads  are  attached.  Before  an  input 
signal  from  a  bonding  pad  reaches  the  interior  circuits  of  a 
chip  it  must  pass  through  a  r <  .  .tor  in  an  overvoltage 
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interactively.  If  the  second  Onix  command  is  used,  speci¬ 
fying  a  command  file,  RNL  first  executes  all  the  commands  in 
cmdfile  and  upon  completion,  starts  taking  commands  from  the 
console.  In  either  case,  RNL  should  be  given  the  following 
commands : 

(load  "uvstd.  1") 

(load  "uwsim.  1”) 

(read- network  "has  en  a  me") 

where  basename  is  the  file  generated  by  presim.  The  first 
two  commands  load  RNL  with  several  macros  which  simplify 
user  interfacing  with  RNL. 

The  user  interface  with  RNL  is  a  LISP  interpreter. 
The  interpreter  continuously  executes  the  loop:  (1)  read  a 
command,  (2)  evaluate  the  command  and  perform  the  specified 
actions,  and  (3)  print  the  result.  There  are  two  formats 
for  specifying  commands  to  this  loop.  The  first  is: 

(function  argument  argument  ...  argument) 

Here  the  parentheses  delimit  the  coomand  and  spaces  separate 
the  elements.  The  interpreter  reads  the  entire  command,  up 
to  the  closing  parenthesis,  then  the  first  element  is  inter¬ 
preted  as  a  function  and  all  the  others  as  arguments.  The 
arguments  may  be  of  the  same  command  form,  (function  arg  arg 
...  arg).  If  the  following  command  were  issued  to  RNL, 

(*  12  (+22)  (/  14  7  )) 

RNL  would  respond  by  typing  96  (12*4*2).  The  other  format 
for  commands  to  RNL  is 

(function  ’(argument  argument  ...  argument)) 

where  the  "  '  n  indicates  the  guote  special  form  which  keeps 
its  argument  from  being  evaluated.  For  example,  (+  2  3) 
evaluates  to  5,  but  '  (♦  2  3}  is  a  string  of  three  elements. 
When  this  second  RNL  command  format  is  not  used  to  represent 
an  argument  of  another  command  (i.e.  is  not  contained  within 


first  step  m  this  format  charge  is  to  use  the  71  text 


editor  to  add  "format: 


UC3"  to  the  header  line  of  base- 


name. sia.  The  other  change  that  needs  to  be  made  is  to 
change  the  labels  for  the  n-channel  transistors  from  " n"  to 
ne".  Using  the  EX  editor,  the  following  steps  accomplish 


this ; 


%  e  basename.sim 
:  g/  n/s//e/g 


-  invokes  the  editor 

-  make  global  change 

for  all  n  as  first  char 
in  a  line,  change  to  e 

-  write  back  edited  file 

-  exit  editor 


The  next  step  is  to  create  a  binary  file  for  RNL 
from  basename.sim  using  Presim.  This  is  done  by  issuing  the 
command : 

%  presim  basename.sim  basename  config 

Basename.sim  is  the  edited  .sim  rile  and  basename  is  the 
file  into  which  presim  writes  its  binary  output.  Config  is 
the  calibration  file  used  to  select  other  than  default 
values  for  the  circuit  element  capacitance  and  resistance. 
A  copy  of  the  presim  user’s  guide  from  the  UW/N7JC  VLSI 
Consortium  release  2.0  and  the  calibration  file  used  in 
simulating  the  adder  are  contained  in  Appendix  C.  The 
values  used  in  the  calibration  file  are  taken  from  the  MOSIS 
supplied  electrical  parameters. 

The  final  step  is  to  run  RNL  itself.  This  is  done 
by  entering  one  of  the  following  two  Unix  commands: 

%  rnl  or 

%  rnl  cmdfile 


where  cmdfile  is  the  name  of  a  file  containing  a  sequence  of 
RNL  commands.  Entering  the  first  Unix  command  will  cause 
RNL  to  take  its  commands  directly  from  the  console 


opened  only  after  the  inputs  were  stable  (i.e.  the  outputs 
of  the  previous  event  were  stable)  and  closed  before  the 
input  gates  of  the  previous  event  were  opened.  Third,  a 
dual  mode  latch  was  used  to  store  the  output  of  each  logic 
event.  In  the  normal  mode  cf  operation,  the  register 
latches  the  outputs  of  one  logic  event  in  parallel  and 
stores  them  to  be  used  as  inputs  for  the  next  logic  event. 
In  its  secondary  mode  of  operation,  the  register  stops 
taking  its  parallel  Inputs  and  starts  to  run  as  a  shift 
register,  shifting  its  contents  onto  an  output  pad. 

One  of  the  consequences  of  using  the  LSSD  technique  is 
the  T.arge  amount  of  area  consumed  by  the  dual  mode  regis¬ 
ters.  In  high  speed  operation,  an  inverter  pair  would  be 
sufficient  to  store  inter-event  results.  But  to  permit  low 
speed  testing  where  the  capacitance  of  a  gate  may  discharge 
during  one  clock  phase,  and  provide  the  dual  mode  feature,  a 
pair  of  clocked  latches  with  control  circuits  is  required. 

C.  LAYOUT  DESIGN 

With  the  logic  decided  upon,  the  next  step  was  to  create 
the  layout  of  the  adder.  The  logic  consisted  of  four  events 
to  produce  the  sum.  Another  event  was  needed  to  latch  the 
input  data  onto  the  chip.  A  two-phase  clock  was  needed  to 
insure  that  two  adjacent  events  did  not  run  simultaneously 
(insuring  stable  inputs  to  each  event).  To  make  the  output 
of  the  adder  compatible  with  the  input  to  another  adder,  a 
one  event  delay  was  added.  This  insures  that  the  output  of 
one  adder  does  not  change  while  a  second  adder  is  using  the 
sum  from  the  first  as  an  input.  With  two  16-bit  addend 
inputs,  one  carry-in  input,  one  power  supply  (Vdd)  input, 
one  reference  (GND)  input,  a  16-bit  sum  output,  one  carry¬ 
out  output,  and  two  clock  inputs,  ten  pads  were  left  from  a 
standard  64-pin  chip  for  register  mode  control  input  and 
register  (shift  mode)  output.  Since  the  design  called  for 
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five  registers,  one  for  each  logic  event  and  one  for 
latching  the  input  data,  five  pads  were  used  for  input  of 
the  register  mode  control  signals  and  five  were  used  for  the 
registers  to  serially  output  their  contents.  With  the 
required  inputs  and  output  identified,  the  preliminary  floor 
plan  shown  in  Figure  4.2  was  created. 
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The  first  circuit  designed  was  the  dual  mode  latch  of 
Figure  4.3  Here  the  circuit  is  designed  to  latch  the  IN 
level  when  Control  is  low  (Control  is  high)  and  phil  is  high 


Figure  4.3  Dual  node  Latch. 


(phil  is  low).  When  phil  goes  low,  a  copy  of  the  input  is 
also  stored  in  the  second  latch  and  becomes  available  at 
shift-out  which  is  connected  to  shift-in  of  the  next  latch. 
When  control  goes  high,  the  IN  signal  is  blocked  and  the 
latch  takes  its  input  from  the  register  to  the  left.  The 
shift-in  of  the  leftmost  latch  in  a  register  is  tied  to 
ground.  Versatec  plots  of  the  actual  layouts  of  this  dual 
mode  latch  and  the  other  circuits  described  in  this  section 
are  given  in  Appendix  E. 

The  AND  gate  used  was  cocstructed  from  a  NAND  gate 
followed  by  an  inverter  as  shown  in  Figure  4.4  Similarly, 
the  OB  gate  was  constructed  from  a  NOR  gate  followed  by  an 
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inverter  (see  Figure  4-5).  Although  logic  implemented  using 
these  AND  and  OR  gates  is  more  area  consuming  than  the  same 
logic  implemented  in  NAND  and  NCR  gates  only,  the  penalty  is 
not  severe  because  they  were  used  infrequently  in  the  final 
design. 


Figure  4.4  AND  Gate. 


The  exclusive  OS  gate  (XOE)  was  constructed  from  tvo 
inverters  and  three  NAND  gates  as  shown  in  Figure  4.6  . 
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Though  this  design  is  considerably  more  area  consuming  than 
the  XCE  gate  of  Figure  3.1,  it  was  selected  because  the  RNL 
circuit  simulator  could  correctly  model  its  operation. 


Figure  4.6  Exclusive  OR  Gate. 

More  complex  logic  functions  were  implemented  using 
programmed  logic  arrays  (PLA)  where  the  outputs  are  the 
logical  sum  (OE)  of  the  products  (AND)  of  inputs.  A  single 
phase  design  was  needed.  A  PLA  designed  to  compute  when 
phil  is  high,  between  the  time  the  preceding  event  had 
produced  stable  outputs  (phi2  going  low)  and  the  time  phil 
goes  low,  had  to  produce  the  proper  sum-of-products  results. 
To  hold  down  fanout,  a  dynamic  structure  was  needed  so  that 
inputs  could  be  applied  to  a  single  type  of  transistor.  To 
prevent  steady  state  power  consumption  a  precharged  dynamic 
structure  was  needed.  Because  of  charge  sharing,  the  prec¬ 
harging  must  take  place  while  the  inputs  are  present  on  the 
transistor  gates  of  the  PLA  (see  chapter  5,  section  C,  for  a 
complete  explanation  of  the  charge  sharing  problem  in  this 
PLA  structure)  .  Thus,  two  distinct  events  must  occur  during 
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this  time  period.  first,  the  inputs  must  be  applied  and 
precharging  must  take  place.  Then  evaluation  must  occur. 

To  cause  these  two  events  to  occur  during  a  single  phase  of 
the  clock,  the  inter-phase  time  when  both  phil  and  phi2  are 
low  must  be  utilized  for  precharging.  The  basic  structure 
of  the  resulting  PLA  is  shown  in  Figure  4.7 


Figure  4.7  P1A  Structure. 


Referring  back  to  the  flocrplan  in  Figure  4.2,  the 
layout  of  the  circuits  which  perform  the  logic  of  each  event 
are  presented  in  Appendix  E.  The  names  assigned  to  the 
layouts  are  given  below.  Event  1  consists  of  a  33-bit  dual¬ 
mode  latch.  Event  2,  which  computes  the  P  and  G  primitives 
for  each  bit, is  made  up  of  16  AND  gates,  16  XOR  gates,  and 
another  33-bit  latch.  Event  3,  which  computes  the  BP  and  BG 
primitives.  The  IES(i)'s  and  the  IC23  for  each  4-bit  block, 
is  made  up  of  four  instances  cf  PLA82  and  a  29-bit  latch. 


The  circuit  PLA82  is  made  up  of  an  8-input,  5-product, 
2-output  PLA  ,  two  XCE  gates,  ore  AND  gate,  and  one  OE  gate. 
Event  4,  which  computes  the  ES(i)  *s  and  BC  for  each  4-bit 
block  uses  four  instances  of  ELA84  to  compute  the  ES(i)'s 
and  one  instance  of  PLA915  to  compute  the  BC  (i)  's  and  a 
21-bit  latch.  The  circuit  PLA915  is  a  9-input,  15-product, 
5-output  PLA  and  the  circuit  PLA84  is  an  8-input,  7-product, 
4-output  PLA.  Event  5  uses  four  instances  of  PLA104  to 
compute  the  S (i)  ’s  and  a  17  bit  latch  to  store  results  and 
provide  the  added  delay  (by  taking  the  output  from  the  shift 
out  position,  the  extra  clock  cycle  of  delay  is  generated)  . 
The  circuit  PLA104  is  a  10-input,  14-product,  4-output  PLA. 
With  this  design,  the  input  to  output  latency  is  three  full 
cycles  of  a  two- phase  non-overlapping  clock;  three  cycles  of 
the  clock  elapse  between  the  time  the  addends  are  presented 
to  the  chip  and  the  time  the  sum  becomes  available  at  the 
output.  In  the  first  three  registers  the  odd  number  of  bits 
is  due  to  the  need  to  store  the  carry-in  value  until  event 
4.  In  the  last  two  registers  the  odd  number  of  bits  is  due 
to  the  need  to  store  the  computed  value  of  carry-out. 

The  resulting  final  layout  of  Figure  4.8  shows  the 
actual  on-chip  layout  locations  of  each  event’s  logic.  In 
additicn  to  the  logic  circuits  for  each  event,  the  circuits 
AMP  and  AMP5  are  also  seen.  These  are  driver  circuits  for 
the  high  fanout  control  and  deck  signals.  Each  takes  as 
its  input  a  control  signal  and  produces  as  outputs  the 
control  signal  and  its  inverse,  both  driven  by  3-micron  x 
160-micron  transistors.  This  amplifier  is  the  same  design 
used  by  the  output  pads  to  drive  off  chip  loads. 

This  final  layout  represents  one  implementation  of  a 
pipelined  CLA  adder  designed  for  testability.  The  relative 
merits  of  this  design  and  others  that  may  have  been  imple¬ 
mented  can,  as  yet,  only  be  g  ualitati  vely  discussed.  The 
addition  of  SPICE  2G7  to  the  CAE  toolbag  will  provide  future 
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CMOS  designers  with  the  quantitative  analysis  necessary  to 
make  decisions  involving  tradeoffs  among  primary  design 
objectives. 
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This  final  design,  when  simulated  using  RNL,  functioned 
properly  at  clock  speeds  up  to  14  megahertz.  Testing  of  the 
actual  chips  produced  by  MOSIS  should  give  an  indication  of 
the  accuracy  of  RNL’s  predictions.  The  following  chapter 
presents  a  test  plan  to  check  for  proper  operation  of  the 
adder  at  low  clock  rates  and  tc  determine  the  maximum  oper¬ 
ating  speed. 


7.  TEST  PLAN 


After  several  iterations  of  the  design-simulate- redesign 
loop,  a  final  layout  was  achieved  for  the  16-bit  pipelined 
adder.  These  iterations  provide  considerable  confidence  in 
the  logical  correctness  of  the  layout.  Appendix  D  contains 
ENL  simulation  results  for  the  full  adder.  In  reading  these 
results  it  should  be  kept  in  mind  that  the  adder  requires 
three  cycles  of  the  two-phase  clock  to  produce  the  sum.  In 
the  first  part  of  the  simulation,  the  inputs  were  kept 
constant  for  three  clock  cycles  to  facilitate  easier  reading 
of  the  results.  With  these  steady  inputs,  simulations  were 
run  to  verify  the  generation  of  correct  sums,  concentrating 
on  those  addends  that  would  produce  carry  propagates  and 
carry  generates  across  the  boundaries  of  the  4-bit  blocks. 
The  last  part  of  the  simulation  utilized  different  inputs 
each  clock  cycle.  This  was  done  to  test  the  pipelining 
feature  of  the  design,  insuring  no  dependence  on  repeated 
inputs  of  the  addends  to  produce  the  proper  sum. 

After  fabrication  of  the  chip,  application  of  similar 
inputs  to  make  the  same  determinations  for  the  actual 
circuits  will  form  the  initial  portion  of  the  test  plan.  In 
this  chapter  a  test  plan  for  the  verification  of  computa¬ 
tional  ccrrectness  and  speed  will  be  presented. 

A.  INPUTS  AND  OUTPUTS 

The  first  step  in  testing  the  chip  will  be  to  connect  it 
to  the  required  input  and  output  circuitry.  To  accomplish 
this,  the  identity  of  the  inputs  and  outputs  on  each  pin 
must  be  determined.  Microscopic  examination  of  the  chip 
will  reveal  the  logo  "16-bit  Add",  located  between  the  GND 
and  Vdd  buses  for  the  pads  in  the  northeast  corner  (see 
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Figure  4.8  which  is  repeated  below  for  convenience) .  Using 
this  landmark,  the  signals  on  the  pads  can  be  labeled  as 
follows. 


Figure  4.8  (repeated)  Final  Layout 
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The  western  edge  has  sixteen  input  pads  for  the  addend 
A,  with  the  least  significant  bit,  A(1),  located  at  the 
northern  end.  The  northern  edge  of  the  chip  also  has 
sixteen  input  pads  for  the  addend  3,  with  the  least  signifi¬ 
cant  bit,  B{1),  located  at  the  eastern  end.  The  southern 
edge  has  fourteen  output  pads  and  two  input  pads.  At  its 
western  end  is  the  GNE  input  pad  followed  by  fourteen  output 
pads  for  S(16),  the  most  significant  bit  of  the  sum,  through 
S(3).  Following  S  ( 3)  ,  at  the  eastern  end  is  the  input  pad 
for  Vdd.  The  eastern  edge  of  the  chip  has  eight  input  pads 
and  eight  output  pads.  Starting  at  the  northern  end,  there 
are  input  pads  for  phil,  phi2,  Cin,  CONI  {control  signal  for 
the  dual  mode  register  of  event  1),  C0N2,  C0N3,  C0N4,  and 
C0N5.  They  are  followed  by  output  pads'  for  SREG1  (serial 
output  from  dual  mode  register  of  event  1),  SREG2,  SREG3, 
SREG4,  SREG5,  Cout,  S  (2)  ,  and  S  (1)  at  the  southern  end. 

To  supply  power  to  the  chip,  +5  volts  DC  should  be 
applied  to  the  Vdd  pad  and  0  volts  to  the  GND  pad.  All 
logical  inputs  including  clocks  and  control  signals  should 
be  either  GND  for  a  logical  0  or  Vdd  for  a  logical  1. 
Simulation  with  RNL  revealed  some  restrictions  on  the  clock 
signals.  For  proper  operation,  each  clock  should  remain 
high  for  a  minimum  of  20  nanoseconds  and  the  clock  inter¬ 
phase  time,  when  both  phil  and  phi2  are  low,  must  be  at 
least  10  nanoseconds  in  duration.  For  initial  testing,  to 
insure  that  charge  sharing  problems  caused  by  too  short  an 
interphase  time,  ana  fanout  problems  caused  by  too  short  a 
clock  phase  duration,  are  not  interpreted  as  fabrication 
errors,  the  clock  speed  should  be  adjusted  so  that  both 
above  clock  parameters  are  exceeded  by  one  order  of 
magnitude. 

The  outputs,  like  the  inputs,  are  at  Vdd  to  represent  a 
logical  1  and  at  GND  to  represent  a  logical  0.  The  circuits 
used  to  measure  the  outputs  should  have  high  input 
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impedance/  on  the  order  of  one  megohm.  The  output  pads  of 
the  adder  are  not  designed  to  handle  the  current  source  and 
sink  requirements  of  transistor-transistor  logic  integrated 
circuits.  The  output  measurement  circuits  should  be 
constructed  using  NHCS  or  CMOS  devicesthat  are  designed  to 
operate  between  +5  vclts  DC  and  ground. 

B.  TESTING  FOE  COBEICT  OPEEATI CN 

After  connecting  the  adder  to  a  test  harness,  the  next 
step  is  to  verify  the  generation  of  correct  sums  by  the 
adder.  There  are  several  inputs  that  should  be  included  in 
the  testing  to  verify  the  correct  operation  of  individual 
circuits.  These  are  contained  i-n  Appendix  F.  In  addition 
to  the  test  vectors  of  Appendix  F,  several  randomly  selected 
input  vectors  should  he  tested.  If  the  adder  should  fail  to 
generate  correct  sums.  The  LSSD  features  can  be  employed  to 
examine  intermediate  results. 

1 .  Intermediate  results 

With  the  LSSD  design,  a  tester  can  leave  input 
levels  constant  for  a  long  period  of  time  and  use  the  shift 
mode  of  the  internal  registers  to  examine  the  internal  state 
of  the  chip.  The  rightmost  bit  of  each  register  is  always 
available  at  the  output  pad  for  that  register.  To  obtain 
the  contents  of  the  other  bits,  the  control  signal  for  the 
given  register  is  set  to  and  held  at  logical  1  while  the 
clock  continues  to  run.  For  registers  1,  3,  and  5  the 
serial  output  will  be  meaningful  and  stable  while  phi2  is 
high.  The  serial  output  of  registers  2  and  4  will  be  stable 
when  phil  is  high.  Table  3  lists  in  order  the  intermediate 
values  available  at  the  3EEG(n)  output  pad  when  the  input 
CONn  is  high. 
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TABLE  3 

Register  Serial  Outputs 

Clock 


Cycle 

SR  EG  1 

SREG2 

SREG3 

SREG4 

SHE'S 

0 

B 1 

PI 

3  P  1 

Cin 

S  1 

1 

B2 

P2 

IES3 

BC2 

S3 

2 

B3 

P3 

IES4 

Cout 

S  5 

3 

B4 

P4 

3G2 

ES2 

37 

4 

B5 

P5 

IES5 

ES4 

S  9 

5 

B6 

?6 

IES6 

ES6 

S11 

6 

B7 

P7 

IC67 

SS8 

S  13 

7 

B8 

P8 

BP3 

ES  1 0 

S  1 5 

8 

39 

?9 

IES1 1 

ESI  2 

0 

9 

310 

P  10 

IES12 

SSI  4 

Cou  t 

1  0 

31  1 

P  1 2 

BG4 

ESI  6 

S  2 

1  1 

B 1  2 

P  12 

I  ESI  3 

BC1 

S4 

12 

313 

P  1 3 

IES14 

BC3 

S6 

13 

314 

P  1  4 

I C  1  4  1  5 

ESI 

S3 

14 

315 

P  15 

3  G  1 

ES3 

S  10 

15 

316 

P  16 

I ES  1 

ES5 

S  12 

16 

A  1 

G  1 

IES2 

ES  7 

S  14 

17 

A  2 

G2 

I C23 

ES9 

S  16 

18 

A3 

G  3 

3P2 

ESI  1 

0 

19 

A4 

G4 

IES7 

ESI  3 

0 

20 

A5 

G5 

IES8 

ESI  5 

0 

21 

A  6 

G  6 

BG  3 

0 

0 

22 

A7 

G  7 

IES9 

0 

0 

23 

A8 

G8 

IES10 

0 

0 

24 

A  9 

G9 

IC10  1 1 

0 

0 

25 

A 1  0 

G  1 0 

BP4 

0 

0 

26 

A  1  1 

Gil 

IES15 

0 

0 

27 

A  1  2 

G  1  2 

I  ES  16 

0 

0 

28 

AH 

G  13 

Cin 

0 

0 

29 

A 1  4 

G  14 

0 

0 

0 

30 

A  1  5 

G  1 5 

0 

0 

0 

31 

A 1  6 

G  1 6 

0 

0 

0 

32 

Cin 

Cin 

0 

0 

0 

3  3 

0 

0 

0 

0 

0 

34 

0 

0 

0 

0 

0 

C.  TESTING  FOR  SPEED  OF  OPERATION 

Once  the  chips  containing  fabrication  errors  have  been 
culled  from  the  chip  set  returned  by  MOSIS,  the  task 
remaining  is  to  determine  just  how  fast  the  adder  can  run. 
Rather  than  simply  increasing  the  clock  rate  until  the  adder 
fails,  the  duration  of  the  time  both  phil  and  phi2  are  high, 
and  the  interphase  time  should  reduced  separately.  RNL 
simulation  indicates  that  the  circuit  which  generates  S4 
within  PLA104  is  the  limiting  circuit  for  clock  phase  dura¬ 
tion  {i.e.  it  requires  the  longest  time  to  correctly 
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diffext  0  ;  diffusion  extension  for  eich  transistor.  i.e,  etch 

;  transistor  is  ssinmed  to  hive  i  rectangular  source 
;  usd  drain  diffusion  extending  d  iff  ext  units  sride  ind 
;  transistor- width  units  high.  The  effect  of  the 
;  diffusion  extension  is  to  add  some  capacitance  to 
;  the  source  and  drain  node  of  each  transistor  - 
;  useful  when  processing  the  output  of  NET  to  improve 
;  the  capacitive  loading  approximations  without  adding 
;  explicit  toad  capacitors.  diffext  i*  specified  in 
;  lambda  (it  will  be  converted  using  the  lambda  factor 
;  above). 

resistance  channel  context  width  length  resist 
;  this  command  specifies  the  equivalent  resistance  for  a  transistor 
;  of  type  chsnnei  with  the  specified  width  and  length.  Transistors 
;  matching  this  entry  will  have  the  specified  resistance;  Linear 
;  interpolation  is  done  if  the  width  and/or  length  is  not  msrched 
;  exactly. 

;  channel  is  one  of  “enh“,  *dep*,  'intrinsc*,  “Tow-power*, 

;  'pullup',  or  “p-chan* 

;  con  text  is  one  of  “static*,  “dynamic-high*,  *dynamic-low*,  or  'power* 

;  width  is  given  in  lambda 
;  length  is  given  in  lambda 
;  resist  is  given  in  ohms 

(•)  These  paramters  should  be  1  only  when  processing  the  output  of 
the  node  extractor.  They  cause  various  corrections  to  be  made 
to  the  interconnect  component  of  a  node's  capacitance  -  usually 
only  extracted  aim  files  have  information  regarding  interconnect 
capacitance. 

PRESIM  uses  these  parameters  in  calculating  the  capacitance  for  each  electrical  node  and  the  resis¬ 
tance  for  each  transistor  channel. 
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voltage,  for  example  '-pi*  specifies  a  VDD  ot  5  volts.  The  result  is  printed  a/ter  PRESIM  completes  ita 
other  procesdng.  When  figuring  the  reautanee  of  a  pulltrp  device  the  'power'  characteristic  resistance 
aa  set  in  the  eonflg  hie  is  used. 

The  optional  third  file  (eonflg)  specifies  various  electrical  parameters.  The  internal  values  (the 
defaults)  are  a  generic  set.  They  do  not  reflect  any  particular  fabrication  process.  (TJW-NSV  VLSI 
NOTE:  A  configuration  file  is  provided  in  the  source  code  that  duplicates  the  internal  settings  aa  an 
example  of  how  this  file  could  be  used.  In  addition  we  note  that,  the  resistor  values  are  stored  first 
sorted  by  width,  then  by  length  not  by  the  ratio.  Values  not  explicitly  provided  in  the  configuration 
file  are  estimated  by  linear  interpolation.)  The  format  of  thia  file  ia  Lines  of  the  form 

parameter  value  comm*nii~ 

Lines  beginning  with  V  are  treated  aa  all  comment.  The  parameter  names  and  their  default  values 
are: 

;  configuration  file  for  'standard*  MFC  process 


capm2a  £0000 
captn2p  £0000 
captna  £0003 
capmp  £0000 
cappa  £0004 
cappp  £0000 
capda  £0010 
capdp  £0060 
cappda  .00010 
cappdp  £0060 
capga  £0040 

lambda  2.5 


lowthrcsh  0J 
highthresh  0.8 


cnrpullup  0 


;  2nd  metal  capacitance  -  area,  pf/sq-micron 
;  2nd  metal  capacitance  -  perimeter,  pf/micron 
;  1st  metal  capacitance  -  area,  pf/sq-mieron 
;  1st  metal  capacitance  -  perimeter,  pflmlcron 
;  poly  capacitance  -  area,  pf/sq-mieron 
;  poly  capacitance  -  perimeter,  pf/micron 
;  n-diffusion  capacitance  —  area,  pf/sq-micron 
;  a-diffusion  capaeitines  -  perimeter,  pf/micron 
;  p-diffusion  capacitance  -  area,  pt/sq-micron 
;  p-diffusiem  capacitance  -  perimeter,  pf/micron 
;  gate  capacitance  -  area,  pf/sq-micron 

;  microns/lambda  (conversion  from  Jim  file  units 
;  to  units  used  in  cap  parameters) 

:  logic  low  threshold  aa  a  normalized  voltage 
;  logic  high  threshold  as  a  normalized  voltage 

;  <  >0  means  that  the  capacitor  formed  by  gate  of 
;  pullup  should  be  included  in  capacitance  of  output 
;  node 


diffpenm  0  ;  <  >0  means  do  not  include  diffusion  perimeters 

;  that  border  on  transistor  gates  when  figuring 
;  sidewall  capacitance  (*) 

subparea  0  ;  <  >0  means  that  poly  over  transistor  region  will  not 

;  be  counted  ss  pan  of  the  poiy-bulh  capacitor  (*) 
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(Thi>  document  il  baaed  on  portion!  at  the  document  'User's  Guide  to  NET,  PRESIM  and 
RNL/NL,'  by  Christopher  J.  Terman,  Laboratory  foe  Computer  Science,  MI.T,  Cambridge,  MA 
02139.) 


One  must  first  convert  the  aim  file  to  a  network  file  suitable  for  use  by  RNL  or  NL  -  to  do  chin 
we  run  PRESIM: 

prttiM  foojim  foo  [config |  optioni  ... 

which  convert!  the  file  fooaim  into  a  binary  file  for  RNL/NL  called  foo. 

The  -f  option: 

Suppresses  the  sum-of-products  formation.  This  may  be  desired  if  you  think 
snm-of-products  is  formed  wrong  otherwise  the  advantages  of  the  translator  and 
node  reduction  make  this  option  unattractive. 

The  -e  option: 

•efllejn  in  value 

writes  a  list  of  node  names  and  capacitances  to  the  specified  file.  Only  capacitances  larger  than  min- 
value  will  be  included. 

The  •»  option: 

•tflie  jnin value 

writes  a  list  of  transistors  and  RC  values  to  the  specified  file  -  there  are  two  entries  for  each  transis¬ 
tor.  The  R‘s  come  from  the  size  of  the  transistor,  C's  from  the  source/drain  capacitance.  Only  RC 
values  larger  than  nunvalue  will  be  included. 

The  -p  option: 

•preist  .voltage 

provides  a  worse-case  estimate  of  the  circuit  power  cousumotion  by  assuming  that  all  the  puilups 
(DEP  or  LOWP  devices  with  drain”VDD)  are  all  on  simultaneouaiy.  ’Voltage’  specifies  the  supply 


-  I  - 


7  9 


UW/NW  VLSI  Release  2 


101/83 


Conf ig 

capm2a 

capm  2p 
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cappa 
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capda 
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file:  used  to  calibrate  ENL 
.00000 
.00000 
.00006 
.00000 
.00006 
.00000 
.00010 
.00060 
.00010 
.  00060 
.00057 


lambda 


1.0 
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RULEC  ( CAD )  CAD  Toolbox  Use  r*  s  Manual  RULEC  ( CAD ) 


NAME 

rulec  —  Compile  design  rules  for  Lyra 

SYNOPSIS 

rulec  [— lo]  rules 

DESCRIPTION 

Rulec  is  a  shell  script  with  the  following  processing  steps: 

i)  The  actual  Lyra  rule  compiler  is  invoked  to  translate  the  symbolic  rule 
description,  rules. r,  to  lisp  code,  rules. L 

ii)  The  lisp  compiler,  Liszt,  is  invoked  to  compile  ruZes.l  to  rules. o 

iii)  rules. o  is  loaded  into  Lyra.proto  to  generate  an  executable  lisp  Lyra, 
rules. 

iv)  The  intermediate  files  rules. L  and  rules. o  are  deleted. 

The  following  options  are  supported: 

-l  (load  only)  No  compilation  is  done.  Previously  compiled  rules,  rules. o, 
are  loaded  into  Lyra.proto  to  generate  an  executable  Lyra,  rules.  This 
option  is  useful  mainly  at  Berkeley,  where  Lyra.proto  changes  frequently. 

-o  (save  object)  Name. o  is  not  removed.  Enables  ‘rulec  -1  rules'  in  the 
future. 

flLSS 

~cad/bin/rulec  —  rulec  shell  script. 

~*cad/lib/lyra/Rulec  1  —  lisp  rule  compiler 
~cad/lib/lyra/Lyra. proto  -  Lyra  sans  compiled  rules  code. 

~cad/lib/lyra/*.r  ~  standard  rulesets. 

~cad/lib/lyra/DEFAULTS  --  gives  default  rulesets  for  Caesar  technologies. 

SEE  ALSO  „ 

Lyra (CAD) 

Liszt  (l) 

AUTHOR 

Michael  Arnold. 
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Slow  p-type  Fast  n-type 

.model  nsf  naos  level=2  rsh=  10  tox=600e-10  ld=.40e-6 
+xj=-60e-6  c j=3- Oe-4  c jsw=2. 0e-10  uo=675  vto=0.6 

+cgsc=2.0e-10  cgdo  =2. Oe- 1 0  nsub=0-5e16 

♦yaax=5e4  pb=.7  mj=.5  mjsw=.5 

+neff=2.5  ucrit=8e4  uexp=. 25 

.model  psf  pmos  level=2  rsh=80  tox=600e— 10  ld= . .25-6 
+xj=.35e-6  cj  =  4. 1e-4  cjsw=2. 5e-10  uo=190  vto=-1. 2 

+cgso=  1 .2e- 10  cgdo  =  1.2e-10  csub=5.0e15  tpg=-1 

+vmax=5e4  pb=.7  mj=.  5  mjsw=.  5 

♦ne£f=2.0  ucrit=8e4  uexp=.  15 
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Fast  p-type  Slow  n-type 

.model  nfs  nmos  level=2  r sh  =  30  tox=600e-10  ld=.25e-6 
*xj=.35e-6  cj=6.0e-4  cjsw  =  4.0e-10  uo=475  vto=1.2 

+cgso=1.9e-10  cgdo=1.9e-10  nsub=1.5e16 

+vmax=5e4  pb=.7  mj=.5  m  jsv=. 5 

+neff=2.5  ucrit=8e4  uexp=.25 

•model  pfs  pmos  level=2  rsh=20  tox=600e-10  ld=.40e-6 
+  x  j  =  . 60e-6  c j  =  2. Oe-4  cjsw=1.0e-10  uo=27 0  vto  =  -Q . 6 

+cgso=2 . Oe- 10  cgdo=2.0e-10  n  sub=0 . 3e 1 5  tpg=- 1 

+vaax=5e4  p  b= . 7  mj=.5  mjsw=.5 

♦neff=2.0  ucrit=8e4  uexp=.  15 


Fast  p-type  Fast  n-type 

.model  nff  nmos  level=2  rsh=  10  tox=550e-10  ld  =  .40e-6 
♦xj=.60e-6  c j  =  3-  Oe-4  cjsw  =  2.0e-10  uo  =  675  vto=0.6 
♦cgso=2.5e-10  cgdo=2.5e-10  nsub=0.5e16 

♦vmax=5e4  pb= . 7  n  j=.  5  m  jsw=.  5 

+nef f=2.5  ucrit=8e4  uexp=. 25 

.model  pff  pmos  level=2  rsh=20  tox=550e-10  ld= . 40e-6 
+  xj  =  .60e-6  c j  =  2. Oe-4  c jsv= 1 . Oe- 1 0  uo=27 0  vto=-0 . 6 
♦cgso=2 . 5e- 1 0  cgdo=2.5e-10  n  sub=  0 . 3e  1  5  tpg=-1 

+  vmax=  5e4  p  b= . 7  mj=.5  mjsw=.5 

♦  neff=2.0  '  ucrit=8e4  uexp=.  15 


APPENDIX  & 

SPICE  MODEL  CABDS  FOB  3-MICRON  CHOS-PH  DEVICES 


CMO  models  for  MOSIS  3-microa  CMOS  Bulk  p- well  devices: 

Fast  Models 

.model  a  nios  vto=0.4  tox=0.7e-7  lambda=1e-7  ld=  1e-6 

+  x  j= 1. 1e-6  gamma=. 3  uo=5  0  0  cbd=5e-4  c  bs=5e-4 

.model  p  pmos  vto=-.4  tox=0.7e-7  lambda=1e-7  ld=  1e-6 

+xj=1.1e-6  gamoa  =  .3  uo=300  cbd=3.5e-4  cbs=3.5e-4 

Slow  Models 

•  model  n  nmos  vto=1.0  tox=  Q. 8e-7  lambda=1e-7  ld=.5e-6 

>x j=0. 6e-6  gamma=1.3  uo=400  cbd=6e-4  cbs=6e-4 
.model  p  pmos  vto=-1.0  tox=0. 8e-7  lambda=1e-7  ld=.5e-6 
+xj=0.6e-6  gamma=. 9  uo=2Q 0  cbd=4. le-4  cbs=4.1e-4 


MIT  Models  for  MOSIS  3-micron  CMOS  Bulk  p-well  devices: 
Slow  -  Slow 

.model  nss  nmos  level=2  rsh  =  50  tox=650e-10  ld=.25e-6 
♦xj=.35e-6  cj  =  6e-4  cjsw=4e-1C  wo=475  vto=1.2 

+cgso=1.3e-10  cgdo=1.3e-10  nsub=1.5e16 

+vmax=5e4  pb= .7  mj=.5  mjsw=. 5 

+neff=2.5  ucrit=8e4  uexp=.25 

.model  pss  pmos  level=2  rsh=80  tox=650e-10  ld=.25e-6 
+xj  =  .35e-6  c j=4. 1e-4  cjsw=2.5e-10  uo= 1 9 0  vto=- 1 . 2 
+cgso= 1.3e-10  cgdo=1.3e-10  nsub=5e15  tpg=-1 

+  vmax=5e4  pb=.7  mj=.5  mjsw  =  .5 

♦  neff=2.5  ucrit=8e4  uexp=.  15 
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is  highly  recommended.  An  added  benefit  of  installing  the 
Release  2.0  package  is  the  cell  library  provided.  The 
library  contains  several  basic  standard  cells  with  known 
performance  characteristics.  The  library  also  contains  the 
standard  pad  frames  used  by  MOSIS.  Though  MOSIS  does  not 
reguire  the  use  of  standard  pad  frames  on  designs  submitted, 
their  use  does  speed  up  fabrication. 

As  mentioned  earlier,  as  socn  as  SPICE  2G7  is  available, 
its  addition  to  the  CAD  toolbag  would  be  most  advantageous 
to  a  CMOS  designer. 

C.  DESIGN  OF  THE  ADDER 

If  the  design  of  the  adder  were  to  be  undertaken  again, 
a  different  approach  to  generating  the  sum  would  probably 
have  been  used,  especially  if  the  new  CAD  tools  mentioned 
above  were  available.  The  logic  approach  to  the  computation 
would  still  involve  CLA  addition,  but  it  would  be  accom¬ 
plished  using  combinational  logic  and  library  cells  rather 
than  PLA's.  Testability  would  probably  suffer  greatly,  but 
effort  would  be  made  to  reduce  the  sum  generation  tc  two 
logical  events.  Though  the  level  of  testability  provided  by 
the  current  design  should  provide  considerable  insight  into 
CMOS  Bulk  p-well  performance  and  CAD  tool  accuracy,  there 
would  be  no  need  to  repeat  the  investigation. 


VI-  CONCI OSIQNS 


The  experience  gained  in  the  design  of  the  adder  coupled 
with  the  clarity  of  hindsight  leads  to  the  following  conclu¬ 
sions  and  recommendations. 

1-  THE  CMOS  TECHNOLOGIES 

The  CMOS  technologies  will  play  a  role  of  steadily 
increasing  importance  in  the  VLSI  designs  of  the  future. 
MOSIS  is  already  offering,  on  an  experimental  basis,  CMOS 
Bulk  p-well  fabrication  with  a  one-micron  minimum  feature 
size.  A  scalable  set  of  design  rules,  to  allow  initial 
fabrication  in  3-micron  CMOS  for  design  verification  before 
the  far  more  expensive  1-micron  process  is  used,  is  being 
developed. 

In  the  private  sector  there  is  considerable  research 
aimed  at  finding  an  insulating  substrate  material  that  does 
not  have  the  variability  and  thermal  problems  of  sapphire. 
Progress  in  this  area  will  remove  the  drawback  caused  by 
latchup  tendencies  in  CMOS  Bulk. 

B.  CMOS  CAD  TOOLS 

Though  the  design  tools  currently  available  at  NPS  consti¬ 
tute  a  complete  set  for  the  design  of  CMOS  Bulk  p-well 
circuits,  the  recent  CAD  tool  set  released  by  the 
University  of  Kashin gton/North w est  VLSI  Consortium,  Release 
2.0  [Hef.  11],  coupled  with  University  of  California  at 
Berkeley  Winter  1983  CAD  tools,  represents  a  more  complete 
and  cohesive  set  for  CMOS  design.  When  sufficient  disk 
space  on  the  Vax  11-780  becomes  available  to  load  the 
Release  2.0,  implementation  of  the  Release  2.0  CAD  package 


simulations  have  indicated  that  the  next  slowest  circuit 
(PLA915)  is  at  least  20%  faster  than  PLA1Q4  {16.0  nsec  for 
PLA915  vs.  20.1  nsec  for  PLA1C4).  Also,  all  other  PLA's 
functioned  properly  with  a  5  nsec  interphase  time. 

Should  PLA104  prove  to  be  the  speed  limiting  circuit  for 
the  chip,  the  actual  failure  speeds  of  the  chip  can  serve  as 
an  indication  of  the  accuracy  of  the  RNL  simulation  for 


nor  fully  off.  Subsequent  inputs  of  in  1  =  0  and  in2=1  may 
produce  correct  results  since  with  constant  inputs,  each 
precharge  time  will  add  more  charge  to  C 2  until  there  is 
sufficient  charge  to  allow  the  output  of  the  Q5-Q6  inverter 
to  remain  low. 

Thus,  to  check  for  charge  sharing  problems  in  the 
circuit  of  Figure  5. 1,  the  inputs  must  alternate.  likewise, 
in  P1A104  to  check  for  charge  sharing  errors  in  output  SI, 
its  inputs  must  alternate  between  ES1=Q,  BC=0  and  ES1=1, 
BC=1  as  the  interphase  time  is  reduced.  This  can  be  accom¬ 
plished  for  all  four  instances  of  PIA104  simultaneously  by 
alternating  inputs  of 

A  =  0001  1001  1001  1001 
B  =  0000  1000  1000  1000 
Cin  =  1 

and 

A  =  0000  0000  0000  0000  ' 

B  =  0000  0000  0000  0000 
Cin  =  0 

To  check  for  charge  sharing  errors  i-  S4,  the  inputs  to  PLA 
104  must  cycle  between  BC= 1 ,  S4=0,  S3=S2  =  1 , S 1  =  0  and 

BC=0, S4=0, S3=S2=S1=1 .  This  may  be  accomplished  for  all  four 
instances  of  PLA  104  simultaneously  by  alternating  inputs  of 
A  =  0110  1110  1110  1110 
B  =  0000  1000  1000  1000 
Cin  =  1 

and 

A  =  0111  0111  0111  0111 
B  =  0000  0000  0000  0000 
Cin  =  0 

This  maximum  speed  testing  assumes  that  RNL  has  correctly 
identified  the  slowest  circuits  on  the  chip.  RNL 
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produce  out=0  when  phil  is  high.  Now  assume  that  the  next 
input  is  in1=0  and  in2=1,  which  should  also  evaluate  to 
out=0.  However,  if  the  precharge  time  (when  the  inputs  are 
present  on  the  gates  of  Q2  and  £3  and  phil  is  still  low)  is 
insufficient,  C2  will  not  be  charged  to  Vdd  when  precharging 
ends  (C2  was  discharged  to  zero  volts  during  the  previous 
evaluation  when  ini  was  high  and  phil  was  high).  Now,  when 
evaluation  begins  (phil  going  high)  the  low  voltage  across 
C2  causes  Q5  and  Q6  to  interpret  their  input  as  a  logical  0. 
As  a  result  the  output  of  the  Q5-Q6  inverter  pair  goes  high, 
causing  Q8  to  turn  on,  discharging  C4  and  resulting  in  an 
output  of  logical  1,  which  is  incorrect.  Table  4  lists  the 
proper  evaluation  seguence  when  precharge  time  is  sufficient 
and  the  improper  seguence  due  to  insufficient  precharge 
time.  In  this  table,  for  the  inputs,  output,  and  capacitor 
voltages  a  1  indicates  Vdd,  0  indicates  GND,  and  X  indicates 
somewhere  in  between.  For  the  transistors,  a  1  indicates 
on,  a  0  indicates  off,  and  an  X  indicates  neither  fully  on 


TABLE  4 

PLA  Evaluation  Sequences 


Proper  evaluation  seguence: 


phi 

1  2 

m 

12 

c 

1234 

^234  567  890 

out 

1 

0 

10 

0011 

1  100Q10C0  1 

0 

0 

0 

10 

00  1 1 

010101 1 C01 

0 

0 

1 

01 

00  11 

010101  1001 

0 

0 

0 

01 

0111 

0011011 C01 

0 

1 

0 

01 

0111 

1010010C01 

0 

Improper  evaluation  seguence: 


p 

hi 

in 

c 

Q  1 

out 

1 

2 

12 

1234 

1234567890 

1 

0 

10 

00  11 

1  100010C01 

0 

0 

0 

10 

00  1 1 

010 101  1C01 

0 

0 

1 

01 

00  11 

010101  1C01 

0 

0 

0 

01 

0X11 

0011011 C01 

0 

1 

0 

01 

0XX0 

1010XX0X10 

1 

69 


REPRODUCED  AT  GOVERNMENT  EXPENSE 


evaluate  its  inputs).  RNL  simulation  also  indicates  that 
the  circuits  in  PLA  104  which  generate  S  1  and  S4  are  the 
limiting  circuits  for  the  clock  interphase  duration. 

Since  the  PLA  is  constructed  of  precharged  dynamic 
circuits,  the  evaluation  clock  phase  must  be  long  enough  to 
allow  the  inputs  to  drive  the  outputs  to  their  proper 
values,  even  if  the  inputs  are  the  same  as  those  of  the 
previous  evaluation  cycle.  This  allows  the  tester  to  use  a 
constant  input  as  the  duration  of  each  clock  phase  is 
reduced  until  the  adder  produces  incorrect  results. 

Determination  of  the  clock  interphase  duration  limit  is 
more  difficult.  This  is  because  the  inputs  to  a  PLA  must  be 
changing  to  cause  charge  sharing  problems  to  occur.  For 


Figure  5. 1  Charge  Sharing  in  a  PLA. 

example,  in  Figure  5.1  assume  that  the  first  set  of  inputs 
is  in1=1,  in2=0,  and  that  this  is  correctly  evaluated  to 


APPENDIX  D 
ADDER  SI B OLATION 


The  following  two  listings  are;  (1)  the  RNL  command  file 
for  the  entire  chip  and  (2)  the  results  of  running  that 
command  file.  In  addition  to  this  overall  testing,  all  the 
layout  of  Appendix  G  were  simulated  individually.  A  nice 
feature  of  RNL  is  the  indication  of  when  a  watched  node 
changes  state.  Thus,  by  making  all  the  outputs  of  a  circuit 
watched  nodes,  RNL  will  provide  the  minimum  time  duration 
for  a  clock  cycle  to  produce  the  outputs  (the  longest  time 
indicated  by  the  simulation).  This  can  be  confirmed  by 
running  the  simulation  with  a  faster  clock,  resulting  in 
outputs  of  X  (neither  1  nor  0)  where  insufficient  time  has 
been  allowed. 

RNL  simulation  to  determine  the  minimum  time  for  prec¬ 
harging  the  PLA  circuits  is  orly  slightly  more  involved. 
For  each  product  term  in  the  ELA,  alternating  inputs  are 
selected  that  will  result  in  maximum  amount  of  N+  diffusion 
needing  to  be  charged  from  0  vclts  to  Vdd.  Then  as  these 
inputs  are  alternated,  the  PLA  precharge  time  is  reduce 
until  the  circuit  fails  to  produce  correct  results.  For  the 
PLA's  in  the  adder,  visual  inspection  for  the  product  term 
with  the  longest  precharge  requirement  was  done  by  looking 
for  the  longest  N+  diffusion  line  which  must  be  charged 
through  the  maximum  number  of  transistors.  The  visual 
inspection  results  were  confirmed  by  RNL  simulations. 
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acr  21-  1  3:bo  19 


c "  1  r: , C ” d  rote  1 


(loo-flle  "cMi.loc") 

( load  "u*  stc.  I  "  ) 

( load  "u* sir . 1 " 1 
C reart-networ*  ’cMr*) 

(setc  noaes  'cal  a2  a1  a«  si  at  a*  af  a9  alC  all  ai2  al3 
a  1 4  al5  aid  cl  c  2  c3  c<  c  5  o f  £7  £  h  n  9  c  1  o  c  1 1  1 1 2  nl3  b  1 4  b  1 5 
1 1 6  Sl  S  2  S3  S  4  s  5  SO  S7  s  b  S  5  SlO  Sll  S  1  2  S13  S 1 4  S 1 5  S 1 6 
clr  cout  rhll  cM2  coni  ccn2  ccn.3  con4  ennS}) 

(crflaa  noo»s) 

1  a  1  a  7  a  3  a  4  a*  a  o  al  a  e  a  c  a  1  o  all  a  1  2  al3  al4  a  1  5  al6 
1  0  1  1 2  0  3  04  £9  PC  f  7  pp  05  £1C  Ml  £1?  £13  £14  n  1 5  £16 
1  coni  con2  cor-. 3  C0h4  con5 
1  cln  r- 1~  1 1  £1.12 


(def vec 

'  (Mr 

clocks 

n  h  1 1  c  M  2  )  ) 

(def vec 

'  (Cin 

dded 

al 

s  a  1 5 

*u 

313 

a  1  2 

al  1 

a  1  0 

s  V 

a  (*  a  7 

a  6 

a  5 

a 4  a3 

a  2 

all) 

(def vec 

'  ( £  m 

Dr*hfc 

n  1 

6  Mb 

r.  1 4 

£13 

M2 

Ell 

£  1  0 

Bb  0  7 

06 

£  5 

£4  £3 

£2 

El  )  ) 

(def  vec 

'(bln 

S'jr 

cout  s 1 6 

slf? 

S  1  4 

S  1  3 

S  1  2 

Sll 

S  9 

S  £  S  7 

S*- 

S  5  S  4  S  3 

S  2 

Sl)) 

(def-renort  '("state  Is  no*:"  (vec  cloc<s)  clr.  cout  ne»llne 
(vec  aaaa)  ne»llne 
i  H«c  arte)  n»»iinf 

(vec  sun  ) )  ) 

(defur  sstoumv) 

)  l  stec  mcr  J ) 

(defun  cvcles  (al 
(reDeat  1  1  a 

(setc  incr  too 
( ss  'dll 
(setc  incr  2b O 
( n  '(prslll) 

( ss  '  ( x  )  ) 

(setc  incr  100) 

(1  ' Coni  1  ) ) 

(SS  '(X)) 

(seta  incr  25u) 

(n  '(r£17)) 

(  S  '  (  x  )  ) 

ci  '(cm:  ) ) 

) 

) 


(cycles 

5) 

( lnvec 

' ( aaaa 

urt  nur  l  1 1  1  0 ■! 0 M  ill)) 

( 1  nvec 

'  ( ctet 

OM  1 1  1  DOOM  1  1  1  GOTO)  ) 

( cycles 

3) 

( 1  nvec 

'(££££ 

0 nlCCO C 001 u 00000 01)) 

(cycles 

3) 

h  clr 

(cycles 

3) 

1  cln 

(lnvec 

'  ( aaaa 

OM  1  1  1  1  1  1  1]  11  1  1  1  1  1  )  ) 

( lnvec 

'(beet 

o  o  o  o  o  r  o  o  0  0  o  o  o  o  c  o  o  o ) ) 

(cycles 

3  ) 

n  clr 

(cycles 

31 

!■  .*• 
‘•V- 

I.*/. 

sV- 


B 


Oct  2° 

1 1 :  b<> 

cr  1  c  .  Citc  Face  2 

J';5- 

1  Cln 

(  lnvec 

* (tbft 

ObCMjOCOOOCO'iOC/vOOU) 

(cycles 

35 

»*  . 

( lnvec 

*  ( at  be 

Or  iii'OijOCi'-OtlOCOOGOO  5  3 

r . 

i  ' 

(cycles 

1  5 

( lnvec 

*  ( aaaa 

0  h  l-  r,  u  0  U  n  1;  0 1>  0  C  C  G  0  0  0  )  5 

(cycles 

1  > 

(lnvec 

'  (nett 

l)  r.  1  1 1 1  1 1 J  1 J  1 1 1 1  n  1  )  ) 

(cycles 

1 5 

( lnvec 

*  (  aaaa 

I'nnnon  J  0(:"G0OiiUU"0  5  5 

Mi 

1  (cycles 

1  5 

(lnvec 

'  ( nfccc 

0=1111011111111111)5 

■  .  ■ 

( cycles 

1  5 

" . 

1  h  clr 

( cycles 

1  5 

• 

( lnvec 

'(aaaa 

obnot'nunoooyornoco )  5 

w  • 

1  (lnvec 

'(tent 

OcGoooorooocr.ocyur )  i 

— 4^— ! 

(cycles 

1 5 

• 

1  cln 

U  — 

1  (cycles 

•*5 

V-s 
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Tec  6  15:23  1 9 fc 4  cMp.loo  Pace  1 


Loadlno  I’wslrr ,  l 
Done  loadlno  u»sl"t.l 

;  3086  nodes,  transistors:  enn=i494  lntrinsic=0 

Ster  oeolns  8  0  ns. 

cni?=c  »  o 

ehll=c  9  o 

cln=C  8  0 

eor5=0  9  o 

con4=p  a  o 

conjso  a  n 

con?=o  a  0 

cor.  1=0  s  0 

r 1 6=0  8  0 

615=0  a  0 

e 1 4=0  9  o 

b 1 3=C  a  0 

012=0  a  o 

hli=o  a  o 

blOaC  a  0 

09=0  8  0 
b«=0  9  o 
67=0  a  0 
06=0  a  o 

05=0  8  0 
c4  =  0  a  0 
b  3  =  0  t  0 
62=0  8  0 
01=0  a  0 
« 1 6=0  a  p 
alS=0  a  n 
a  1 4  =  0  8  0 
a  1 3  =  0  8  0 
a  1 2=C  a  n 
a  11=0  a  o 
alO=o  a  c 
a9=o  8  o 
««=o  e  o 
a7rc  9  c 

a6=o  9  0 

e5=0  9  0 
a4=o  a  o 
a3=0  a  o 
a2=o  a  o 
al=0  a  o 


p-enan=ii4i  dep  =  o  io»-oower=0  otillup=o  resls 


Ster  beoins  8  in  ns. 
phllal  8  0 

Step  Oeolns  8  35  ns. 

D  6  1  1  E  0  9  0 

Step  Oeolns  a  4'  ns. 
onl2=l  a  o 
s 1 6=0  a  14.2 
«°=o  a  16.4 


nec  6  15:71  198#  c01c.log  F«ac  2 


SllsO  4  16.4 
S 1 3=0  »  16.4 
S 1 5=0  A  16.4 
s7  =  0  S  16.4 
S5«0  a  16.4 
S 3  =  0  a  16.4 
S 1 4=0  a  16.5 
S  t  2  =  0  a  16.5 
SlOaO  9  16.4 
S»  =  0  9  16.5 
S6 =0  e  16.5 

S 4=0  9  16.5 

S?=0  9  16.7 
Sl=0  9  20 
stete  Is  now: 

Current  tl<re  =  70 
cloe*s=0b0i  cln=0  cout=x 
aaaa=ObOOOOOOOOnnncnooo 

bbbbsObOOCOOOOOCOOOOOOO 
SU»=X0 000000 oo 0000 000 

Stec  beoins  a  70  ns. 
phl2  =  0  •  (1 

Step  beoins  0  80  ns, 
0011=1  s  0 

Stec  beoins  0  105  ns. 
rOllsO  #  0 

Step  beoins  »  115  ns. 
0012=1  8  0 
coutso  a  72.9 
state  is  now : 

Current  rtee  =  140 
clocks =0b01  cln=0  cout=o 
aaae=ObOOCOOOOOoooooooo 
bbooaotonoooooooooooooo 
sun=0bOOO00oroo00000000 

step  beoins  9  140  ns. 
c012*0  a  0 


Step  beoins  a  1 50  ns, 
0011=1  0  0 


Step  beoins  9  175  ns. 
0011*0  »  0 

Ster  begins  A  185  ns. 
0012=1  a  0 
state  Is  now: 

Current  tlire*  210 
cloc*s=0b0l  close  cout=o 

aaaa=Otooooooooooocoooo 

boet=orooooocoooccooooo 


T 


T 


"mm  •  ii  ,<  i  .  . . <  v  7w«»"  .11 1  '.1  *~*n*r+y*y^r* 


0ec  €  15:23  19«<*  cnip.loa  Paoe  3 


Sum*0b00000000000000000 

Step  bealns  <•  210  ns. 
pnl2=o  a  0 

Step  teems  9  220  ns, 
phlisi  9  0 

Step  teclns  a  245  ns. 
onli=o  a  n 

Step  beclns  9  255  ns. 
ohl2=l  »  n 
state  Is  now: 

Current  tlfe=  290 
eloc*s=0b0l  cln=0  cout=o 
aaae=0b0000000000000000 
bhhbxObOOOOOCOOOOOOPooo 
SUmsOtOOOOOOOOOCOCOOOOC 


Step  teains 

f* 

2«0 

ns. 

onl2=0  e  0 

Stec  teains 

0 

290 

ns . 

pnllsl  ®  0 

Step  bealns 

0 

315 

ns. 

ehllsO  e  n 

Ster  bealns 

325 

ns , 

pni?si  ®  0 

state  Is  now 

>  • 

Current  tlw 

350 

ClOCkS=Ob01 

cln  =  C 

coutsO 

aaaasotooooocoooonooooo 
Pb tb=0 t 0009000 00 0C0 0000 
SUirrOtOOOCOCOOOOOOQOOOO 

Step  bealns  e  350  ns. 

S 1 6= 1  ®  0 
b 1 5= 1  9  C 
b 1 4= 1  9  0 
b 1 3= 1  9  0 
b9s 1  9  0 

bO»l  ®  0 
b6«l  »  0 
h5«l  9  0 
a  1 2  =  1  9  0 
al 1»1  9  0 
al 0=1  a  0 
ab=l  9  0 
a4=l  »  0 
a3si  a  c 
a2  =  l  a  0 
al*l  e  c 
rbl2=0  9  0 
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Step  beolns  8  360  ns. 
ohllsi  a  o 

Sten  beqins  8  3?5  ns. 
onlisO  a  o 

Step  beqins  8  395  ns, 

chi?ri  »  o 

state  is  now: 

Current  time*  420 
elocks=0b0l  cm=0  eoutso 
aaaasObPOOOiluoooOiin 

bbphsObl 11 100001 1 110000 
SUirsObOOOOOOOPOOOOOOOOO 

SteD  beclns  «  420  rs. 
nni?=o  a  0 

Star  beams  8  430  ns. 

bbllsi  »  o 

Step,  bealns  P  455  ns. 

Phi  1  *0  a  o 

f tec  beclns  a  465  ns. 
nnl2sl  a  o 
state  Is  now: 

Current  times  490 
ClOCkS=0b01  clnso  COUt=0 
aaaa=0b0  0  00i  l  lionooim 
bbbbsCt 1111000011110000 
SUirsOtOOOOOOCOOOOOOOOOO 

Stec  beolns  a  49c  ns. 
Pbl2s0  a  o 

Sten  beolns  e  500  ns. 
ohllrl  e  0 

Stec  beams  8  525  ns. 
PtlllsO  e  0 

Stec  beolns  9  535  ns. 

ohl?si  e  o 

sibsi  •  14,6 

S9s 1  a  16.7 

*11*1  8  16.7 

S 1 3s 1  8  16.7 

S 1 5s  l  a  16.7 

S7sl  a  16.7 

S5sl  8  16.7 

S3sl  8  16,7 

SI  4*1  8  16,8 

812*1  8  16.9 

SlOsl  8  16. F 
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$9=1  «  16.9 
$6=1  0  16.8 
$4*1  0  16.0 
$2*1  0  17 
$1=1  9  19.1 
state  1$  now: 

Current  tine=  560 
clocks=0b01  cln=P  coutsO 
aaaa=otooooi  l 1 1  oooon  1 1 
bbbbsCb 1 1 liooooillionoo 
$um  =  0b011 111  11 1 11111111 

Steo  fcealns  0  560  ns, 

h9= 1  9  0 

bl=l  s  0 

bl6=C  9  0 

el5*C  9  0 

b 14=0  a  o 

bl 3=0  0  0 

b0=C  0  C 

b7=0  a  0 

b6=n  9  o 

b5=C  9  0 

0012=0  <■  0 

Stee  begins  0  570  ns. 
chi 1 = 1  p  0 

Ster  bealns  ?  59*  ns. 
nhll=0  «  o 

Stec  beclns  0  605  ns. 
nhl7=l  *  0 
state  1$  now: 

Current  tine*  630 
clock$=CbOl  cln=n  cout=0 
aaaaeotoooci  11100001111 
bbbb*Ot00nco00100O00nni 
sunsooo  1 111111111111111 


■ten  bealns 

a 

630 

ns . 

no l 7=0  0  0 

Step  beolns 

P 

640 

os . 

ohll*i  0  0 

Ster  beclns 

P 

665 

ns . 

0011*0  »  0 

Ster  beclns 

a 

675 

os . 

0012=1  »  0 

state  Is  no* 

• 

Current  tine 

700 

cloeks=ObOi 

C  1  0  *  o 

coutsO 

aaaa*otcocoiiiio oooiiu 
rbbp=otnoocnonioonoooci 
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suirspfoi  i  iniininnii 

Step  beeins  8  700  ns. 
oh  1 7*0  8  o 

Ster  beams  s  7io  ns, 
Dhllai  a  o 

Step  beains  8  735  ns, 
ohllso  e  0 

Step  beains  8  745  ns, 

Phl2*l  e  o 

s 1 6*0  9  14,2 

S9*0  9  16.4 

Sll  =  0  S  16.4 

S 1 5*0  »  16.4 

Sls0  8  16.4 

S3*0  0  16.4 

Sl4s0  B  16.5 

s  1 2*0  8  16.5 

510*0  9  16.5 

Sfl*0  a  16.5 
S6=0  9  16.5 
s4*0  a  16.5 
s2*0  e  16.7 
sl*P  b  20 
state  is  no*; 

Current  time*  770 
cloc*s*0b01  clnsO  eo«t=0 
aaaa*oronooi i n  oocoi 1 1 i 
b bt paObOCOOnOOl 00000001 
SUmaObOPOOlOOOOOOOlOCOO 

Step  beains  8  770  ns. 
cln*l  8  0 
ohi?*o  a  o 

Ster  beeins  a  790  ns. 
pbllei  8  0 

Step  beains  8  805  ns. 
phllao  8  0 

Step  beains  8  815  r>s. 
ohl2*l  8  o 
state  Is  now: 

Current  time*  84u 

ClOCKS*0b01  clna]  COUtrO 

aaaaaObOOOO llliooooii 1 1 
tbbbaOtOOOCOCOlOPOOOOOl 
SUmaObOOOO 100000001 0000 

Step  beolns  »  840  ns. 

cbi2s0  9  0 
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Step  ceolns  »  950  ns. 
Dbllsl  “  0 

Sten  bee Ins  9  975  ns. 
onll*!)  a  o 

Step  neelns  8  995  ns. 
eni2*i  ®  f 
state  is  not: 

Current  tibe*  910 
clocK-ssObol  cin=l  coursO 
eaea*ObOOOOilllOOOCllll 
bbbhsObOOOOCCOlOOOOOOCl 
SUinsObOPOOl  00000001  0000 

Step  beclns  9  910  ns. 
oni2*0  a  o 

Step  beclns  9  920  ns. 
pnllsl  a  n 

Ster  ben  1 n s  9  945  ns. 
phi  1*0  a  o 

Stcc  beclns  a  955  ns. 
ohl2*l  e  0 
s1*l  «  19.1 
state  is  now: 

Current  tlise*  980 
cloeKs=0b0l  eln  =  l  eout  =  0 
aaaa  =  0b0  000l  1 11 000011 11 

nhbbeoboonooooionocooai 

surr*0b0000100000O01  0001 

ster  beclns  9  990  ns. 
a  1 6*1  a  0 
a  1 5* 1  9  0 

al 4*1  a  0 
a  1 3= 1  a  o 
a8  =  l  8  0 
a7=i  a  0 
36*1  a  0 
a5*l  e  0 
b 9sso  a  c 
bl*o  a  0 
clneC  9  0 
oni2*0  a  0 

Ster  heclns  a  990  ns. 
chi  1*1  a  0 

Stec  beclns  f  1015  ns, 
pnll*0  a  0 

Stec  heolns  9  1025  ns, 
ohl?*t  8  n 
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state  Is  now: 

Current  tine*  m50 
clccks=0b0l  cm=0  coutso 
aaaasotl 1 11 1  1111 1 1  mu 
norb*otoooooooocoooooco 

sumaObCOOOlOOCOOOOinooi 

Steo  teams  »  1050  ns. 

Dhl2*0  B  o 


Step  beams  9  lOto  ns. 
ohl 1  si  a  o 

Steo  teoins  a  1095  ns. 

poll =0  8  0 

Sten  teams  a  1095  ns. 
chl?=i  a  o 
state  Is  now: 

Current  tine*  1120 
clcekssotOI  cin=c  coutsn 
aaaasOfc 1 1  1 1 1 1 1 1 1 1 1 ) u  1 1 
tttbsceooor.oonrinonooooo 
SUnsOtOOOOlOOOOOOUlOOOl 

Stec  tealns  a  1120  ns. 
pnl2=C  a  n 


Step  tealns  a  1130  ns. 
ohllsl  a  0 

Step  teolns  9  1155  ns. 
DMl 1 *0  a  0 

Step  tealns  »  11M  ns. 

phl2*l  »  0 

Sl6*l  B  14.6 

S9=!  a  16.7 

Sllal  8  lfi.7 

S 1 5*1  a  16.7 

S  7* 1  a  16.7 

s3*l  a  16.7 

s 1 4* 1  a  16. P 

sl2*i  a  16. a 

*10*1  9  lb. 8 

S  8* 1  a  16.8 

s6*l  a  16.9 

s  4* 1  a  16.9 

*2*1  a  17 

state  is  now: 

Current  tine*  1190 
c locks  =  0t>o  1  c  1  n*0  cout  =  o 
aaaa*otl 1 11 1 11111111111 
btbbsobooonooonoooooooo 
SunaObOl 111111111111111 
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Step  bealns  a  1190  ns, 
clr=l  a  0 
ehl7=0  a  o 

Ster  bealns  9  1200  ns. 
cnl 1=1  a  0 

Steo  bealns  a  1225  ns. 
d  ii  1 1  *  o  a  o 

Sten  bealns  a  1735  ns. 

phi 2= 1  a  o 
state  Is  now: 

Current  tlee  =  1250 
clecas=Oh0i  cln=i  cout=0 
aaaasfCl  1 1 1 1 1 1 1 1 1 1 1 1 1  1 1 
thbbsobooooonoooooooooo 

SUbsObOl  111111111111111 

Step  bealns  a  1750  ns. 
on  1 2  =  C  »  0 

Ster  ►'fains  »  1270  ns. 
nni  1  =  1  a  0 

Step  bealns  a  17Q5  ns. 
0*11=0  a  0 

Ster  bealns  «  1305  ns. 

nni2=l  ®  0 

state  Is  now: 

Current  tlne  =  1330 
eloc*s=9b0i  cln=l  cout=0 
aaaa  =  ot  1 111111111111111 
nbbn=ObOOOOOCOOOOOOOOOO 
suw=0b01 111111111111111 

Step  bealns  a  1330  ns. 
-012=0  a  0 

Ster  bealns  9  1340  ns. 
onll=l  a  n 

Step  bealns  e  1355  ns. 
rhl 1 *C  «  0 

Ster  beolns  e  1375  ns, 
onl2=l  a  0 

S 1 5=0  a  14,7 
S9  =  0  a  if. 4 
S 1  1  =  0  8  lb.4 
s  1  3  =  0  9  16,4 
s 1 5  =  C  a  16,4 
S^O  *  16.4 

S5=0  a  16.4 
S  3  =  o  a  16.4 
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S14=0  9  16.5 
s 1 2»0  8  16.5 
$10=0  a  16.5 
S«  =  0  a  16.5 
S6=0  «  16.5 
S4=C  a  16.5 
S2="  8  16.7 
Sl=0  8  20 
cout=l  a  21.1 
state  Is  no*: 

Current  tlme=  14U0 
clgcKs=0c01  cln=1  cout=l 
aaaa=oc 1 1 1 1111111111111 
n&  btjsotocnoocoooooooooo 
sun'sOMncOf'nnnoGOOOnooO 

Ster  necins  a  14 on  ns. 
n  1  a  1  a  C 
clnaO  9  0 
on  1 ?  =  o  e  0 

5 ter  grains  °  Min  ns. 
c  n  1 1  *  1  e  o 

Strn  neclns  0  M3;  ns. 
on  1 1  sc  »  n 

Ster  trains  »  1445  ns. 
on  1 2=  1  a  n 
state  is  "o«: 

Current  tine*  1470 
Cloc*s=0D01  ClnaO  coutat 
aaaaaOn  1  1  1  11  1  1 1 1  1 1  1  1  11  1 

sneeaotoonon ooooooooooi 

Su 8=061000000000 COOOpnC 

Ster  trains  9  1470  ns. 
cnl7=0  0  n 

Stec  neclns  a  1  4  9  0  ns. 
cni 1  a  1  »  0 

Stec  beol-s  a  1505  ns. 
cnllac  a  0 

Stec  begins  0  1515  ns. 
on 1 7 a l  a  0 
state  is  no*  : 

Current  tinea  i 5 4 o 
elocics  =  rBOl  cln  =  C  cout  =  l 
aaae  =  nt  l  11  1 1  l  l  11  1  l  1  l  l  M 
tbthant-noooocnoonooonoi 
SuaaObinonooooonoooocOO 

Stec  nealns  9  154A  ns, 
n nl?=n  a  n 
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Stec  beulns  8  1550  ns, 
cniisi  a  o 

S ter  b  e  a  1  n  s  *  1515  ns. 
oni l =0  s  0 

Stec  b  e  c  1  n  s  ?  1585  ns. 
chl?*i  «  o 
state  1 s  no* : 

Current  tine?  1  ft  1  0 
clocKs=onoi  cln=0  cout=l 
aaaa=oc 1  111111111111111 
sDcn=oeonnnnonnonooonoi 
sumsoeioononpoonoomono 

.5 ten  neelns  *  1 6 1  o  ns. 
b  1  =  0  ?  0 
oniiso  f  n 

Stec  neolns  a  1620  ns. 
dM1  =  1  a  0 

Stec  resins  a  1646  ns. 
n  n  1  1  =  o  a  n 

Ster  neclns  a  1655  ns. 
cnl?*i  »  o 
state  Is  no«: 

Current  tUe?  1680 
cloc6s  =  irni  c  1  n  =  0  c o u t ? 1 
aaaa=0t  1 1  1 1 1 1 11 1 1 1 1 1 1  1 1 
Obfcr=OfcOOnOOUOOOOOOOOnc 

SurtisDblOOOC'OCOOOnoonooO 

Stec  beclns  8  1689  ns. 

a  1 6  =  0  o  o 

a  15  =  0  b  o 

a  1 4  =  n  a  o 

a  13  =  0  a  o 

a  1 2  =  C  a  0 

a  1 1 =0  a  0 

a  1 0  =  o  a  0 

a9=o  a  0 

a°  =  0  e  o 

a  i  =  o  a  o 

a6=0  »  0 

a5=0  B  0 

a4=c  a  o 

a  3  =  0  a  c 

a2=o  a  c 

a t  =0  a  o 

rnl?=n  a  n 

Ster  benlns  «  1690  ns, 
cn  1 1  =  l  a  n 
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Stee  beoins  8  1715  ns. 
ohll*0  9  0 

Ster  bealns  9  1725  ns. 
chJ2»l  *  0 
state  Is  no«: 

Current  tl»e«  1750 
elocitsanfcOl  cln«r  cout*i 

aaaaaofcoocooo oooooooooo 
tbbfcaOthOOOPOOOOroOhCOO 
SuraObl OohOPC 00000 000 00 

Stee  beoins  6  1750  ns. 
bt6*l  a  0 

D 1  5a  1  S  0 
b  1 4a 1  9  0 

b 1 3*1  a  c 
1 12*1  e  o 

hi  1*1  9  0 
610*1  *  0 
b9*l  9  0 
b8*l  a  0 
b7*l  a  (* 
bfial  #  0 
b5*l  a  o 
*>4*1  *  0 
b3*l  •  0 
b2*t  a  0 
bi*i  a  n 
ohl2*n  a  o 

Ster  beoins  a  1760  ns. 
Phi i a i  a  o 

Steo  beoins  a  1785  ns. 
Ohliao  a  0 

Steo  beoins  a  1795  ns. 

ch!2*l  »  0 

s 1 6*1  a  14.6 

s9*i  a  16.7 

Sllal  a  16.7 

Sl3*l  a  16.7 

Si  5*1  a  16.7 

S7*1  a  16.7 

s5«l  a  lb. 7 

s3*l  a  16,7 

Sl 4*1  9  16.8 

Sl2*l  a  16.8 

s l o* i  a  16.8 

sR*l  a  16.8 

56*1  a  16.8 

S  4*1  a  16,8 

S2*l  8  17 

S 1 *1  *  19.1 
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coutaO  9  22.9 
state  Is  new: 

Current  time*  1®2 n 
cloeks*0b01  cln*o  cout*0 
aaaaaoboooocooocooconoo 

bbbb«0b 1 1111 11111111111 

suraflbOllllll  llllili  ill 

Step  beelns  P  1820  ns. 
al?*l  «  0 
□ni2*o  a  o 

ster  beelns  a  1630  ns. 
nnll»l  e  o 

Ster  beelns  »  1P55  ns. 
phiiso  e  o 

Steo  beelns  9  1865  ns, 

ohl2«l  •  0 

S 1 6*0  8  14.2 

s«»*0  *  16.4 

sl1*0  a  16.4 

Sl3*0  B  16.4 

S15*0  *  18.4 

S7*0  9  16.4 

S5*0  a  16.4 

s3*0  a  16.4 

s 1 4*0  a  16.5 

sl2*0  P  16.5 

S 1 0*0  a  16.5 

a8*n  a  16.5 

s6«c  a  16.5 

s4*c  B  16.5 

S2*0  «  16.7 

Sl*0  S  20 

state  is  now: 

Current  tlire*  1B90 
clocks*OhOl  clraO  cout*0 

aaaaaobooooiooonooooooo 

bbbhsOt 1 llllllllllllltl 
SUaaOb 000 0000 OOOOOOC 000 

Ster  beelns  B  I89n  ns. 
bl7*0  a  o 
eni2*0  a  0 

Step  beolns  a  1900  ns. 
onii*l  •  0 

Ster  beelns  a  i»25  ns. 
onil«o  e  o 

Step  beelns  e  1*35  rg, 
unl2*i  »  0 
s!6al  a  14.6 
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^  TttOKSSZ  1TTZT?  T7T2  V3WZ  Wmp  £!*  "iTS  ^TiLTK^"  <r<T<T<n>’iT  mWVM'VlVhVia 


p 


K- 


liec  6  15:23  l?P4  enic.loo  pane  14 


*9*1  9  16,7 
*11*1  9  16.7 
*13*1  9  16.7 
*15*1  9  16.7 
*7*1  •  16.7 
*5*1  •  16.7 
*3*1  9  16.7 
*14*1  9  16. P 
*12*1  a  16.8 
*10*1  a  16. P 
*8*1  a  16. p 
*6*1  a  16.8 
*4*1  a  16.8 
*2*1  a  17 
•1*1  4  19.1 
state  Is  no«: 

Current  tl*e*  i960 
clocks*0b0i  eln*0  cout*o 

•••••obooooiooooooooooo 

hhhb*0M 11101 11 11 1 11 11 1 
su**nnoi n i i i n 111 l l i i i 

Sten  beolns  a  i960  n*. 
cln*l  9  0 
Phi2*0  9  0 

Step  heoln*  P  1970  n*. 
onii*i  a  n 


Step  heplns  9  1"95  n«. 
ohji*o  •  0 


Steo  neeln*  *  joo?  n*. 

Phl2*1  •  0 

*16*0  •  14,2 

(13*0  a  16.4 

*15*0  *  16.4 

*  1 4*C  a  16.5 

*12*0  a  16.5 

cout*l  a  2i,  i 

state  is  now: 

Current  time*  2030 
eloclcs*0b0i  cln*l  coutal 
•••#*060000100000000000 
bbbbaObl 111011111111111 

*ua*0b  1000001  mi  u  in  i 

Step  Oeoln*  a  2030  n«. 
bl 6*0  a  0 
p  1 5*0  a  0 
hi 4»o  a  0 

013*0  a  0 

bll«o  a  0 
nio*o  a  0 
t9*o  a  0 
na*o  a  0 
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vO.\yL' v.v  v  v.’-'.v.v .\-  \-.v.v,y. 


nee  t  15:23  19M  enip.log  Paae  15 


b7*n  »  0 
b6*0  •  0 

n5«o  *  c 
b4aft  a  0 
b3*0  •  0 
b2«0  •  0 
b1*0  *  0 
al?«0  9  0 
ohl2*0  a  o 

Stec  begins  9  2040  ns. 
ohll*l  •  0 

Ster  benlns  9  2065  ns. 

onilao  f  o 

Step  beoins  a  2075  ns. 

pnl2a]  a  0 

S  1  6*1.  9  14.4 

Sl3*l  9  16.7 

Sl5al  9  16.7 

Sl4al  9  16.9 

sl2»l  9  16. • 

eoutao  a  22.9 

state  is  now: 

Currert  time*  2100 
eloetcsaOtoi  eln*l  coutsO 
aaaa*0bpooooooooooooono 
bbbb*0b00000000OC000000 
SuaaObOllllll  mill  111  1 

Ster  beoins  9  2100  ns. 
elnao  a  o 
cnl2*o  s  o 

Steo  beoins  9  2110  ns. 
phliai  a  o 

Stec  begins  a  2135  ns. 
bbllaO  9  0 

Step  beoins  a  7145  ns. 

phl?a]  9  0 

S 1 6*0  9  14.2 

(4a0  9  16.4 

SllaO  9  16,4 

S 1 3*0  9  16.4 

S 1 5*0  9  16.4 

S7*0  9  16.4 

S5*n  9  16.4 

S3*0  a  16.4 

S 1 4*0  9  16.5 

Sl?*0  a  16.5 

S 1 0*0  9  16.5 

S 9*0  9  16.5 

s6*0  a  16.5 


n«sc  6  15:23  1984  cMe.loo  tape  16 


s4a0  a  16.5 
(2>0  a  16.7 
SUP  t  20 
eout«l  *  21.1 
state  Is  new: 

Current  tleei  2170 
cloeksiObOi  ein»o  eoutii 

eaaeiofcooooooooooooooco 

bbbbiObPOnOOOOOOOOOOOno 

suiiobiooooooocooocoooo 

Step  beolns  «  2170  ns. 
nhl2«C  8  0 


Step  beolns  8  21  on  ns. 
dM  1>1  •  0 


Step  beolns  8  2705  ns. 
rhlUO  *  0 

Ster  beolns  8  2215  ns. 
oh!7il  P  0 
cout*0  •  22.9 
state  Is  no«t 
Current  time*  2240 
OlocksiOfOl  c) n*0  couteO 

aaeaioboooooooonoooooon 

DbbbiObOOOOOOOOOCOPOnoo 

sueiotooooooooooonooooo 

Step  beolns  a  2240  ns. 
Phl2«0  •  0 


Step  neplns  »  2250  ns. 
PhllU  P  0 

Step  beolns  a  2275  ns. 
PhlliO  a  o 

Step  neolns  »  228?  ns. 
ohl2il  a  o 
slip  a  20 
state  Is  now: 

Current  tleei  2310 
cloeks*0b0l  clmo  coutiO 
eaaaiObOO oooo ooooooo ooo 
bbbbiObOOO 000 oooooooooo 
SUliObOOOOOOOOOPOOOOOOO 

Step  beolns  a  23ie  ns. 
oh!2>0  a  0 


Ster  beolns  a  2320  ns. 
ohim  a  o 


ster  benlns  a  234?  ns 


REPRODUCED  AT  GOVERNMENT  EXPENSE 

'  i»  •  i  "  -*■  ^  ** 


wr  '  r  • 


.  .  y%| t;  yr  i-.’^'T-,  ■*-,  wr;  r  .- 

REPRODUCED  AT  GOVERNMENT  EXPENSE 

'  »*  1  /  .  r .  " 


BE 


expens 


A££Mjm  f 
TEST  VECTORS 


Addend  A  Addend  B  Cin  Sub 

asb-  -  -  -  -  isb  nsb-  -  -  -  -  lsb  msb-  -  -  -  -  -lsb 


initialize  all  internal  nodes 

oooooooooooooooo  0000000000000000  0  xxxxxxxxxxxxxxxxx 

0000000000000000  OOOOOOOOOOOOOOOO  0  xxxxxxxxxxxxxxxxx 

OOOOOOOOOOOOOOOO  OOOOOOOOOOOOOOOO  0  OOOOOOOOOOOOCOOOO 

test  for  proper  P  and  G  primitives 

OOOOOOOOOOOOOOOO  1111111111111111  0  01111111111111111 

1111111111111111  OOOOOOOOOOOOOOOO  0  01111111111111111 

0101010101010101  1010101010101010  0  01111111111111111 

1010101010101010  0101010101010101  0  011111  1  1111  1  11  111 

test  fcr  proper  IES 

0001000100010001  OOOOOOOOOOOOOOOO  0  00001000100010001 

0001000100010001  00010001000  10001  0  00010001000100010 

0101010101010101  0001000100010001  0  00110011001100110 

0101010101010101  0101010101010101  0  01010101010101010 

test  fcr  proper  IC23 

0101010101010101  0011001100110011  0  01000100010001000 

0010001000100010  0011001100110011  0  00101010101010101 

test  for  carry  from  block  to  blcck 

0000000000001111  0000000000000001  0  00000000000010000 

00000000000011  11  OOOOOOOOOOOOOOOO  1  00000000000010000 

0000000011111111  OOOOOOOOOOOOOOOO  1  00000000100000000 


0000000011111111 
0000000011111111 
0000  1111  111  111  1  1 
0000  11  11111111  11 
0000111111111111 
0000111111111111 
1111 11 11111111 11 
1111111111111111 
1111111111111111 
1111111111111111 
1111111111111111 


0000000000000001  0 
0000000000010000  0 
0000000000000000  1 
0000000000000001  0 
0000000000010000  0 
0000000100000000  0 
0000000000000000  1 
0000000000000001  0 
0000000000010000  0 
0000000100000000  0 
OOOIOOOOOOOCOOOO  0 


00000000100000000 
00000000100001111 
00001000000000000 
00001000000000000 
00001000000001  11  1 
00001000011  1 11  11 1 
10000000000000000 
10000000000000000 
10000000000001111 
10000000011111111 
100001 1 1111111111 
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